data-pipelines topic
fluvio
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
didact-engine
The REST API and execution engine for the Didact Platform.
spark
Performance Observability for Apache Spark