data-engineering-zoomcamp icon indicating copy to clipboard operation
data-engineering-zoomcamp copied to clipboard

Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing

Data Engineering Zoomcamp

Taking the course

2024 Cohort

  • Start: 15 January 2024 (Monday) at 17:00 CET
  • Registration link: https://airtable.com/shr6oVXeQvSI5HuWD
  • Cohort folder with homeworks and deadlines

Self-paced mode

All the materials of the course are freely available, so that you can take the course at your own pace

  • Follow the suggested syllabus (see below) week by week
  • You don't need to fill in the registration form. Just start watching the videos and join Slack
  • Check FAQ if you have problems

Syllabus

Module 1: Data Ingestion & Infrastructure as Code

  • Python data ingestion with polars and pandas
  • Rust data ingestion
  • data load tool (dlt)
  • Terraform for BigQuery and GCS
  • Homework

Module 2: Workflow Orchestration

  • Workflow Orchestration with Airflow
  • Workflow Orchestration with Mage
  • Workflow Orchestration with Prefect
  • Homework

Module 3: Data Warehouse

  • BigQuery Data Warehouse
  • Lakehouse with Delta Lake/Iceberg
  • Homework

Module 4: Analytics Engineering

  • BigQuery and dbt
  • Redshift and dbt
  • Databricks and dbt
  • ClickHouse and dbt
  • PostgreSQL and dbt
  • DuckDB and dbt
  • Data Visualization with Superset/Metabase
  • Homework

Module 5: Batch processing

  • PySpark
  • Spark + Scala
  • Spark + Kotlin (TBD)
  • Homework

Module 6: Streaming

  • Kafka for Stream Processing with Kotlin
  • Kafka Streams with ksqlDB
  • RisingWave: Streaming Database
  • Homework