Azure Data Catalog and Power BI
ADC and Power BI mix to generate a Corporate BI Suite
Some time ago there are emerging services that are very interesting tools to a goal and organized a corporate level to Power BI deployment.
I’ve been lately involved in projects of high scale in terms of deployment of Power BI, in which we have defined Roles (Readers, Contributors, Analysts, Power BI Champions, IT Administrators), Members Apps Workspaces, Configuring Power BI Premium with differentiated capabilities, Readers Group for Workspaces Premium, and strategies for Governance and Deployment.
With this in mind we can be align with the objectives of the whole organization and the ability to develop a real culture of Data Analytics with information set up, and also process definitions to provide some way that allows business analyst generating data sources to implements reports in a timely manner for presentation on a corporate level.
With different design meetings where we engage architecture models, and data sources, and that is what I hope to share in this post.
I set up a DEMO with this proof of concept that consists on: Azure Data Catalog and Power BI.
Let mention some capabilities and definitions of each of these services and why they were selected at this stage of design.
Data Catalog Azure is a cloud service that allows users to discover corporate data sources and somehow exploit different sources of information within the organization.
With Data Catalog, the user (Analyst, Data Scientist, Developer) can discover, understand, and consume different types of data sources. It also includes very powerful tools in terms of definition of metadata, and annotations. It becomes a single space, and centralized so that users also contribute and really generate a culture of data at the organizational level.
Use cases scenarios of Azure Data Catalog:
Registration Information Sources
- Having the ability to add sources like LOB systems (Line of Business)
- OLTP systems
- Databases business intelligence systems
We can mention many questions from analysts to generate their reports. To quote a few of them:
- To generate a new report for Human Resources. What source should I connect? Should I use a view or a table? Where is it located? In what system? What credentials?
- Where should I look for consolidated sales of my organization for the fiscal year that just ended?
- Who should I ask to access the data I need to generate my report?
One of the first steps within the team that generates the Data Catalog is collecting information from all sources, identifying and registering them.
It would be a great time to facilitate the task of finding sources.
With Azure Data Catalog, you can also share the user’s email or responsible for controlling access to data sources, tools and documentation links or free text also allows access ordering group. This simply allows a business user, who may not have permissions to access certain information to a simple request access to IT users in charge of the catalog.
Self Service – Business Intelligence
A Self Service culture will allow to address analysts to generate their own reports, lists, and dashboards without dependence on the schedule and availability of IT staff. In these cases reports combine different sources of information are generated, many of which may not have been used historically in the company, and this in turn implies that a large number of cases which source you do not know used to implement certain reports.
Azure Data Catalog comes as an option to break that cycle of discovery that is usually done manually. This means that after the first cycle where the business analyst discovers the sources of optimal data to generate certain reports the can register, and add information (metadata) to make this source easier to discover future analysts requiring such data for the implementation of similar reports. The discovery of these sources, and capability to add metadata are procedures do not have to give at the same time but Data Catalog allows work annotations by analysts as a continuous work in time where more information is added to the repository every time.
It may be an excellent and proactive task of promoting this technology, the IT team populate the repository of this catalog with the most common sources report generation and its associated metadata. This can be an excellent incentive for analysts.
Consolidating knowledge groups
At this point happens that individuals who are working for a long time in the organization can found almost naturally sources used in each report. This knowledge acquired over the years in many cases reside on SharePoint sites or documentation at best. With Data Catalog we have a centralized repository, with annotations that are making analysts and IT team, which also has the ability to make links to technical documents.
Now let’s implement an Azure Data Catalog to see the process and take control.
First, let’s open the Azure Portal, we have a corporate account to register a Data Catalog. We should use the search filter resources to look at Data Catalog:
The we have to enter to the Azure Portal, and we can generate a new catalog simply defining catalog name, subscription to use, and Location.
Then opens a screen where we define if you want to use the Free or Standard Catalog capacity (see details of benefits that gives each option)
It is time to define users the we want to add to our catalog:
We almost completing the process and we have to define Administrators for our Catalog:
In this final stage we just create the catalog and the and then it would be provisioned
Well, it is now time that we enter the catalog for general management of it and begin our task of populating it with data sources, and determine access to it.
The starting point after we provisioned, is to enter from the Resource Dashboard in our Azure Subscription:
If we look at this screen, and then click on our catalog we will open a window for details and settings of our catalog:
Our Azure Data Catalog has a comprehensive management console, where we can take control of sources of information to add to our repository, add users, manage metadata and annotations on our data.
We go through Publish Data button to see the console and begin the task of populating our repository:
In this case I will add as a DB data source generated in Azure and use it to illustrate the management of our Data Catalog:
After that we sign with our Azure account to move on and define the type of data source that will incorporate to the repository of Azure Data Catalog:
Our catalog accepts a great number of sources. Some of them are:
- Data Azure Lake
- Azure Blob Storage
- Azure Storage Directory
- Azure SQL / SQL DW
- SAP Hana
- SharePoint (lists)
This list is growing exponentially in recent months, remember that the same has happened with Power BI from the beginning where we started with 15 data sources and today we exceed 75 data sources which can connect to generate our reports.
Then we select the option of SQL Server in this case:
We will request data from the SQL database with which we are working to add it to our ADC. The server name and credentials are the data that we have to make this record of our data source.
Immediately we see all listed the databases on our server where we can select the objects to add to our Azure Data Catalog. We have the ability to add both, tables and views from our database:
In our Azure Data Catalog and on the same screen we can define if we want a Preview (20 rows) of the selected data.
We can also add a Data Profile to include statistics of our objects (maximum, minimum, average values, number of rows).
Remember in this stage that we also have the ability to define metadata for our objects in order for the analyst to find information on our catalog (in a simple and direct way).
Once we checked this source, it is added to our Azure Data Catalog
On the Home page of our Portal Azure Data Catalog we can see the objects that we recorded in our repository.
Them we have the ability to open our objects with Power BI and then processing as part of our model:
Remember to set in our BD Server the firewall with the corresponding IP not to be blocked.
As we can see on the left in the previous screen, we can see all search criteria that we apply to our catalog, and metadata associated with the different sources of information that are searchable. Those filters make our repository extremely powerful.
An example of Power BI opening one of the objects we discovered in our Azure Data Catalog:
This article attempts to represent a first approach to Azure Data Catalog and the ability not only to centralize data sources which can work at the corporate level as well as its discovery by analysts, adding metadata, links to documentation, and opening these sources directly from tools such as SQL Data Tools, Excel and Power BI.
I hope it have been of interest and soon I will add another tool to this stack.
Hope to see you soon!