dataengineering topic
prodmodel
Build, test, deploy, iterate - Dev and prod tool for data science pipelines
aws-ddk
An open source development framework to help you build data workflows and modern data architecture on AWS.
SparkDataset
Instant search for and access to many datasets in Pyspark.
data_engineer_interview_challenges
Found a data engineering challenge or participated in a selection process ? Share with us!
apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
data-engineer-challenge
Challenge Data Engineer
pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
airflow-docker-metrics
kedro-action
A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).
kedro-static-viz
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.