data-lake topic

List data-lake repositories
trafficstars

lakeFS

4.1k
Stars
330
Forks
Watchers

lakeFS - Data version control for your data lake | Git for data

goodreads_etl_pipeline

1.2k
Stars
209
Forks
Watchers

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Udacity-Data-Engineering-Projects

1.4k
Stars
464
Forks
Watchers

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required

cuelake

283
Stars
28
Forks
Watchers

Use SQL to build ELT pipelines on a data lakehouse.

kyuubi

2.0k
Stars
864
Forks
Watchers

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

amazon-s3-find-and-forget

233
Stars
36
Forks
Watchers

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

aws-serverless-data-lake-framework

397
Stars
129
Forks
Watchers

Enterprise-grade, production-hardened, serverless data lake on AWS

marmaray

472
Stars
110
Forks
Watchers

Generic Data Ingestion & Dispersal Library for Hadoop