data-pipeline topic
go-streams
A lightweight stream processing library for Go
seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
cuelake
Use SQL to build ELT pipelines on a data lakehouse.
whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collecti...
pansori
Tools for ASR Corpus Generation from Online Video
DataEngineeringProject
Example end to end data engineering project.
aws-pdf-textract-pipeline
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
covalent
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
klio
Smarter data pipelines for audio.