datalake topic

List datalake repositories

trino

9.7k
Stars
2.8k
Forks
Watchers

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

lakeFS

4.1k
Stars
330
Forks
Watchers

lakeFS - Data version control for your data lake | Git for data

zingg

902
Stars
109
Forks
Watchers

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

automate-dv

466
Stars
113
Forks
Watchers

Hyperion pre installed on Raspberry Pi OS Lite

automate-dv

466
Stars
113
Forks
Watchers

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

aws-orbit-workbench

128
Stars
26
Forks
Watchers

A Data Platform built for AWS, powered by Kubernetes.

doris

11.6k
Stars
3.1k
Forks
262
Watchers

Apache Doris is an easy-to-use, high performance and unified analytics database.

hudi

5.1k
Stars
2.4k
Forks
1.2k
Watchers

Upserts, Deletes And Incremental Processing on Big Data.

delta-lake-internals

177
Stars
36
Forks
Watchers

The Internals of Delta Lake

cuelake

283
Stars
28
Forks
Watchers

Use SQL to build ELT pipelines on a data lakehouse.

hudi-resources

525
Stars
154
Forks
Watchers

汇总Apache Hudi相关资料