data-pipelines topic

List data-pipelines repositories

fluvio

3.8k
Stars
506
Forks
Watchers

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

dolphinscheduler

13.1k
Stars
4.7k
Forks
Watchers

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

mage-ai

7.2k
Stars
660
Forks
Watchers

🧙 Build, run, and manage data pipelines for integrating and transforming data.

CogStack-NiFi

34
Stars
16
Forks
Watchers

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Udacity-Data-Engineer-nanodegree

72
Stars
72
Forks
Watchers

Classwork projects and home works done through Udacity data engineering nano degree

spark-transformers

39
Stars
29
Forks
Watchers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

unstructured

8.6k
Stars
702
Forks
Watchers

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

SmartPipeline

23
Stars
2
Forks
Watchers

A framework for rapid development of robust data pipelines following a simple design pattern

didact-engine

48
Stars
0
Forks
Watchers

The REST API and execution engine for the Didact Platform.

spark

129
Stars
10
Forks
Watchers

Performance Observability for Apache Spark