datalake topic

List datalake repositories

deeplake

7.8k
Stars
599
Forks
Watchers

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....

matano

1.4k
Stars
91
Forks
Watchers

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

dinky

2.9k
Stars
1.0k
Forks
36
Watchers

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

LakeSoul

2.3k
Stars
419
Forks
Watchers

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

amoro

719
Stars
251
Forks
Watchers

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Real-time-Data-Warehouse

100
Stars
40
Forks
Watchers

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

ApacheSpark

82
Stars
59
Forks
Watchers

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...

Streamis

97
Stars
42
Forks
Watchers

Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.