datalake topic

List datalake repositories

trino

9.6k
Stars
2.8k
Forks
Watchers

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

lakeFS

4.1k
Stars
329
Forks
Watchers

lakeFS - Data version control for your data lake | Git for data

zingg

890
Stars
109
Forks
Watchers

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

HyperBian

282
Stars
36
Forks
Watchers

Hyperion pre installed on Raspberry Pi OS Lite

HyperBian

282
Stars
36
Forks
Watchers

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

aws-orbit-workbench

128
Stars
26
Forks
Watchers

A Data Platform built for AWS, powered by Kubernetes.

doris

11.5k
Stars
3.1k
Forks
262
Watchers

Apache Doris is an easy-to-use, high performance and unified analytics database.

hudi

5.1k
Stars
2.4k
Forks
1.2k
Watchers

Upserts, Deletes And Incremental Processing on Big Data.

delta-lake-internals

177
Stars
36
Forks
Watchers

The Internals of Delta Lake

cuelake

283
Stars
28
Forks
Watchers

Use SQL to build ELT pipelines on a data lakehouse.

hudi-resources

521
Stars
155
Forks
Watchers

汇总Apache Hudi相关资料