This is a demo data engineering project built on Azure cloud. ETL pipeline implemented in Azure Data Factory is ingesting and transforming Illumina Platinum Genomes dataset. Terraform is used for provisioning infrastructure. Deployment pipelines are implemented in GitHub actions in accordance with official Microsoft guidelines, with working CICD process between DEV and PROD environments. DEV ADF is connected to git repository, and ARM templates are propagated to PROD environment through GitHub action.
Databricks visualization:
ADF pipeline execution 1:
ADF pipeline execution 2:
Official Microsoft CICD diagram:
Postgres view:
Created With
- Terraform - Infrastructure automation tool
- Databricks - Web-based processing platform for Spark
- GitHub Actions - CICD tool
- Azure - Cloud computing service owned by Microsoft