data-engineering-zoomcamp
data-engineering-zoomcamp copied to clipboard
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
Data Engineering Zoomcamp
Taking the course
2024 Cohort
- Start: 15 January 2024 (Monday) at 17:00 CET
- Registration link: https://airtable.com/shr6oVXeQvSI5HuWD
- Cohort folder with homeworks and deadlines
Self-paced mode
All the materials of the course are freely available, so that you can take the course at your own pace
- Follow the suggested syllabus (see below) week by week
- You don't need to fill in the registration form. Just start watching the videos and join Slack
- Check FAQ if you have problems
Syllabus
Module 1: Data Ingestion & Infrastructure as Code
- Python data ingestion with polars and pandas
- Rust data ingestion
- data load tool (dlt)
- Terraform for BigQuery and GCS
- Homework
Module 2: Workflow Orchestration
- Workflow Orchestration with Airflow
- Workflow Orchestration with Mage
- Workflow Orchestration with Prefect
- Homework
Module 3: Data Warehouse
- BigQuery Data Warehouse
- Lakehouse with Delta Lake/Iceberg
- Homework
Module 4: Analytics Engineering
- BigQuery and dbt
- Redshift and dbt
- Databricks and dbt
- ClickHouse and dbt
- PostgreSQL and dbt
- DuckDB and dbt
- Data Visualization with Superset/Metabase
- Homework
Module 5: Batch processing
- PySpark
- Spark + Scala
- Spark + Kotlin (TBD)
- Homework
Module 6: Streaming
- Kafka for Stream Processing with Kotlin
- Kafka Streams with ksqlDB
- RisingWave: Streaming Database
- Homework