data-pipeline topic

List data-pipeline repositories

go-streams

1.8k
Stars
146
Forks
Watchers

A lightweight stream processing library for Go

seatunnel

7.5k
Stars
1.6k
Forks
172
Watchers

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

conduit

353
Stars
41
Forks
Watchers

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

cuelake

283
Stars
28
Forks
Watchers

Use SQL to build ELT pipelines on a data lakehouse.

whylogs

2.6k
Stars
118
Forks
Watchers

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collecti...

pansori

138
Stars
28
Forks
Watchers

Tools for ASR Corpus Generation from Online Video

DataEngineeringProject

1.0k
Stars
208
Forks
Watchers

Example end to end data engineering project.

aws-pdf-textract-pipeline

159
Stars
19
Forks
Watchers

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

covalent

708
Stars
85
Forks
Watchers

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

klio

832
Stars
47
Forks
Watchers

Smarter data pipelines for audio.