data engineering with azure data factory: linked services

 


data engineers are responsible for creating and maintain data pipelines that provide data to data analysts, data scientists, ML engineers or business analysts to do their work. creating a data pipeline is a combination of extraction, transform and load data. At the modern industrial level companies used various APIs, databases, file types to records their data. so it's very complex work to create generic pipelines to extract and transform data in this context. data engineers commonly used programming languages like Scala, python, java to create programmes and many more tools. The Azure cloud offers its own tool which Data Factory to do the ETL process.

Linked Services

linked services are the connections of the source side of the pipeline or/and connections of the sink (loading destination) of the pipeline. there can be more than one linked service in ether sides. there may be few sources you have to use to extract (get) data into the pipeline or there may few destinations you need to update or insert data that come through the pipeline. Azure data factory already supports a rich collection of different kinds of services to users.

multiple linked services on either side

Services available in Azure linked services

Azure provide basically two kinds of linked services which are data stores and compute. data stores currently available are

  • Azure services:
    • Azure blob storage
    • azure cosmos DB (MongoDB API)
    • azure cosmos DB (SQL API)
    • azure data explorer
    • Azure data lake storage gen 1
    • Azure data lake storage gen 2
    • azure database for MariaDB
    • Azure Database for MySQL
    • Azure Database for PostgreSQL
    • Azure DataBricks delta lake
    • Azure file storage
    • Azure key vault
    • Azure SQL database
    • Azure SQL Database managed instance
    • Azure search
    • Azure synapse analytics
    • Azure table storage
  • Database
    • Amazon RDS for Oracle
    • Amazon RDS for SQL Server
    • Amazon Redshift
    • Amazon Redshift
    • DB2
    • Drill
    • Google AdWords
    • Google BigQuery
    • Greenplum
    • Hbase
    • Hive
    • Informix
    • MariaDB
    • MS Access
    • MySQL
    • Netezza
    • Oracle
    • Phoenix
    • PostgreSQL
    • Presto
    • SAP BW Open Hub
    • SAP BW via MDX
    • SAP HANA
    • SAP Table
    • SQL server
    • Spark
    • Sybase
    • Teradata
    • Vertica
  • File
    • Amazon S3
    • Amazon S3 Compatible
    • FTP
    • File system
    • Google Cloud Storage (S3 API)
    • HDFC
    • HTTP
    • Oracle Cloud Storage (S3 API)
    • SFTP
  • Generic protocol
    • ODBC
    • OData
    • REST
    • SharePoint Online List
  • NoSQL
    • Cassandra
    • Couchbase
    • MongoDB
    • MongoDB Atlas
  • Services and apps
    • Amazon Marketplace Web Service
    • Concur (Preview)
    • Dataverse (Common Data Service for Apps)
    • Dynamics 365
    • Dynamics AX
    • Dynamics CRM
    • GitHub
    • HubSpot
    • Jira
    • Magento
    • Marketo
    • Oracle Eloqua
    • Oracle Responsys
    • Oracle Service Cloud
    • PayPal
    • QuickBooks
    • SAP ECC
    • Salesforce
    • Snowflake
    • Web Table
compute services available in the Azure data factory:
  • Azure Batch
  • Azure Data Lake Analytics
  • Azure Databricks
  • Azure Function
  • Azure HDInsight
  • Azure Machine Learning
  • Azure Machine Learning Studio (classic)

create linked services

before creating a data factory you need an Azure subscription and then you can create a resource group which is a kind of folder in azure which you can arrange and manage related resources in one place. then we can search for the data factory in the azure portal search bar.

Search in the Azure portal for data factory

step 1: 

On this page, you can create a new data factory or manage the current data factories you have. to create a new data factory click create button and then it directs to the data factory creation window.

Data Factory home

step 2:

you can use either an existing resource group or can create a new resource group.



step 3:

we can connect the version control system to our data factory. (GitHub/ Azure DevOps) it also can configure later in the Azure data factory studio. after filling in relevant information your can review and create a data factory. after deployment you can go to the Azure data factory home.


step 4:

from this data factory home, we can manage and monitor the data factory. to create linked services you can use azure data factory studio.



step 5:

go to manage.


in this interface, there are few things we should discuss before moving to the next step.
  1. ingest - used to copy data once or based on a schedule.
  2. orchestrate - create pipelines without coding
  3. transform data - apply data transformation steps to pipeline
  4. configure SSIS - manage and run SSIS packages on the cloud
you can use either orchestrate or author to pipeline creation canvas. the monitor is used to monitor pipelines, triggers after they are published. manage is the easiest way to create and manage linked services and triggers we use in this data factory.

step 6:

to create either source side connection or sink-side connection first you click on the new button and then choose the appropriate service from the service panel popup. if you create a sources side connection add your connection details through the connection form and check whether it's connected or not with the test connection button. by using the same steps you can create a sink-side connection by adding your sink-side connection details to chosen service.

step 6.1



step 6.2


available services, you can choose your source side connection types and sink side connections from them.

step 6.3 - I choose oracle database as my source side. it totally depends on your source side.



then Azure ask for your credential to this source. after filling them you can test this connection is working or not by clicking on the test connection button.



step 6.4 - my choice on the sink side is the azure SQL database, again this also depends on your use case and your requirements.


you have added your credential to the sink side connection also. the credential types depend on the service you choose.

Note: Before you create a sink-side connection you have to create resources needs for your chosen sink side. in my case first I have to create an Azure SQL server and database before I connect them to the data factory.


Comments

  1. 😎πŸ”₯😎πŸ”₯

    ReplyDelete
  2. Well done brother,SuperbπŸ‘ŒπŸ˜Š

    ReplyDelete
  3. I'd want to express my gratitude for the time and work you put into making this informative and engaging essay. I'm interested in learning more about data engineering solutions .

    ReplyDelete

Post a Comment