dataengineering topic

List dataengineering repositories

pypi-duck-flow

119
Stars
18
Forks
Watchers

end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence

moose

22
Stars
4
Forks
Watchers

The developer framework for your data & analytics stack

run-a-data-team

46
Stars
2
Forks
Watchers

A guide for leading a data (engineering) team

Prescriber-ETL-data-pipeline

15
Stars
3
Forks
Watchers

An End-to-End ETL data pipeline that leverages pyspark parallel processing to process about 25 million rows of data coming from a SaaS application using Apache Airflow as an orchestration tool and var...

RealtimeStreamingEngineering

26
Stars
16
Forks
Watchers

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data...

FootballDataEngineering

16
Stars
14
Forks
Watchers

An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Facto...

SparkingFlow

25
Stars
17
Forks
Watchers

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.