data engineering with azure data factory: linked services

September 30, 2021

data engineering with azure data factory: linked services

data engineers are responsible for creating and maintain data pipelines that provide data to data analysts, data scientists, ML engineers or business analysts to do their work. creating a data pipeline is a combination of extraction, transform and load data. At the modern industrial level companies used various APIs, databases, file types to records their data. so it's very complex work to create generic pipelines to extract and transform data in this context. data engineers commonly used programming languages like Scala, python, java to create programmes and many more tools. The Azure cloud offers its own tool which Data Factory to do the ETL process.

Linked Services

linked services are the connections of the source side of the pipeline or/and connections of the sink (loading destination) of the pipeline. there can be more than one linked service in ether sides. there may be few sources you have to use to extract (get) data into the pipeline or there may few destinations you need to update or insert data that come through the pipeline. Azure data factory already supports a rich collection of different kinds of services to users.

multiple linked services on either side

Services available in Azure linked services

Azure provide basically two kinds of linked services which are data stores and compute. data stores currently available are

Azure services:

Azure blob storage
azure cosmos DB (MongoDB API)
azure cosmos DB (SQL API)
azure data explorer
Azure data lake storage gen 1
Azure data lake storage gen 2
azure database for MariaDB
Azure Database for MySQL
Azure Database for PostgreSQL
Azure DataBricks delta lake
Azure file storage
Azure key vault
Azure SQL database
Azure SQL Database managed instance
Azure search
Azure synapse analytics
Azure table storage

Database

Amazon RDS for Oracle
Amazon RDS for SQL Server
Amazon Redshift
Amazon Redshift
DB2
Drill
Google AdWords
Google BigQuery
Greenplum
Hbase
Hive
Informix
MariaDB
MS Access
MySQL
Netezza
Oracle
Phoenix
PostgreSQL
Presto
SAP BW Open Hub
SAP BW via MDX
SAP HANA
SAP Table
SQL server
Spark
Sybase
Teradata
Vertica

File

Amazon S3
Amazon S3 Compatible
FTP
File system
Google Cloud Storage (S3 API)
HDFC
HTTP
Oracle Cloud Storage (S3 API)
SFTP

Generic protocol

ODBC
OData
REST
SharePoint Online List

NoSQL

Cassandra
Couchbase
MongoDB
MongoDB Atlas

Services and apps

Amazon Marketplace Web Service
Concur (Preview)
Dataverse (Common Data Service for Apps)
Dynamics 365
Dynamics AX
Dynamics CRM
GitHub
HubSpot
Jira
Magento
Marketo
Oracle Eloqua
Oracle Responsys
Oracle Service Cloud
PayPal
QuickBooks
SAP ECC
Salesforce
Snowflake
Web Table

compute services available in the Azure data factory:

Azure Batch
Azure Data Lake Analytics
Azure Databricks
Azure Function
Azure HDInsight
Azure Machine Learning
Azure Machine Learning Studio (classic)

create linked services

before creating a data factory you need an Azure subscription and then you can create a resource group which is a kind of folder in azure which you can arrange and manage related resources in one place. then we can search for the data factory in the azure portal search bar.

Search in the Azure portal for data factory

step 1:

On this page, you can create a new data factory or manage the current data factories you have. to create a new data factory click create button and then it directs to the data factory creation window.

Data Factory home

step 2:

you can use either an existing resource group or can create a new resource group.

step 3:

we can connect the version control system to our data factory. (GitHub/ Azure DevOps) it also can configure later in the Azure data factory studio. after filling in relevant information your can review and create a data factory. after deployment you can go to the Azure data factory home.

step 4:

from this data factory home, we can manage and monitor the data factory. to create linked services you can use azure data factory studio.

step 5:

go to manage.

in this interface, there are few things we should discuss before moving to the next step.

ingest - used to copy data once or based on a schedule.
orchestrate - create pipelines without coding
transform data - apply data transformation steps to pipeline
configure SSIS - manage and run SSIS packages on the cloud

you can use either orchestrate or author to pipeline creation canvas. the monitor is used to monitor pipelines, triggers after they are published. manage is the easiest way to create and manage linked services and triggers we use in this data factory.

step 6:

to create either source side connection or sink-side connection first you click on the new button and then choose the appropriate service from the service panel popup. if you create a sources side connection add your connection details through the connection form and check whether it's connected or not with the test connection button. by using the same steps you can create a sink-side connection by adding your sink-side connection details to chosen service.

step 6.1

step 6.2

available services, you can choose your source side connection types and sink side connections from them.

step 6.3 - I choose oracle database as my source side. it totally depends on your source side.

then Azure ask for your credential to this source. after filling them you can test this connection is working or not by clicking on the test connection button.

step 6.4 - my choice on the sink side is the azure SQL database, again this also depends on your use case and your requirements.

you have added your credential to the sink side connection also. the credential types depend on the service you choose.

Note: Before you create a sink-side connection you have to create resources needs for your chosen sink side. in my case first I have to create an Azure SQL server and database before I connect them to the data factory.

for more details: Azure Data Factory documentation

Comments

UnknownSeptember 30, 2021 at 7:46 AM
Great! 👏👏
ReplyDelete
Replies
Lakshan W.D.DSeptember 30, 2021 at 9:57 AM
what a rich content. keep it up bro
ReplyDelete
Replies
1111September 30, 2021 at 10:49 AM
😎🔥😎🔥
ReplyDelete
Replies
UnknownOctober 1, 2021 at 7:35 PM
Very informative
ReplyDelete
Replies
Hasaranga BandaraOctober 1, 2021 at 10:17 PM
Well done brother,Superb👌😊
ReplyDelete
Replies
Aaron jhonsonNovember 26, 2021 at 4:36 PM
I'd want to express my gratitude for the time and work you put into making this informative and engaging essay. I'm interested in learning more about data engineering solutions .
ReplyDelete
Replies

Add comment

Search This Blog

DataMob

data engineering with azure data factory: linked services

Linked Services

Services available in Azure linked services

create linked services

step 1:

Data Factory home

step 2:

step 3:

step 4:

step 5:

step 6:

step 6.1

step 6.2

step 6.3 - I choose oracle database as my source side. it totally depends on your source side.

step 6.4 - my choice on the sink side is the azure SQL database, again this also depends on your use case and your requirements.

for more details: Azure Data Factory documentation

Comments

Post a Comment