dataengineering topic

List dataengineering repositories

prodmodel

58
Stars
4
Forks
Watchers

Build, test, deploy, iterate - Dev and prod tool for data science pipelines

aws-ddk

247
Stars
20
Forks
Watchers

An open source development framework to help you build data workflows and modern data architecture on AWS.

SparkDataset

34
Stars
8
Forks
Watchers

Instant search for and access to many datasets in Pyspark.

data_engineer_interview_challenges

61
Stars
10
Forks
Watchers

Found a data engineering challenge or participated in a selection process ? Share with us!

apache-spark-docker

41
Stars
23
Forks
Watchers

Dockerizing an Apache Spark Standalone Cluster

pyspark-on-aws-emr

24
Stars
13
Forks
Watchers

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

kedro-action

20
Stars
3
Forks
Watchers

A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).

kedro-static-viz

27
Stars
2
Forks
Watchers

kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.