This is a demo data engineering project built on Azure cloud. ETL pipeline implemented in Azure Data Factory is ingesting and transforming Illumina Platinum Genomes dataset. Terraform is used for provisioning infrastructure. Deployment pipelines are implemented in GitHub actions in accordance with official Microsoft guidelines, with working CICD process between DEV and PROD environments. DEV ADF is connected to git repository, and ARM templates are propagated to PROD environment through GitHub action.

Databricks visualization: alt text

ADF pipeline execution 1: alt text

ADF pipeline execution 2: alt text

Official Microsoft CICD diagram: alt text

Postgres view:
alt text

Created With

GitHub repository