add GlassFlow
What is this Python project?
GlassFlow is a serverless, Python-centric real-time data transformation solution for end-to-end data pipelines. If you use GlassFlow, you do not need Apache Kafka and Flink. Visit the docs page to learn more: https://docs.glassflow.dev/get-started/introduction
Describe features.
You can:
- Use GlassFlow out-of-the-box with any existing Python library.
- Start GlassFlow without a complex initial setup such as creating clusters.
- Skip the headache of managing partitions, shards, and workers' setup.
- Define your pipeline as code using GlassFlow CLI.
- Implement your transformation function using GlassFlow Python SDK
- Run your Python code locally for easy development and debugging.
GlassFlow does:
- Provides a pure Python and zero infrastructure environment.
- Keeps your original data where it is.
- Connects live data sources.
- Ingests real-time data continuously.
- Does real-time data transformation.
- Simulates your production workloads.
- Deploys your pipeline to production within minutes.
- Delivers auto-scalable serverless event streaming infrastructure.
What's the difference between this Python project and similar ones?
Most real-time data processing tools including Kafka are Java-based, while in recent days Python has been the go-to language for data science and machine learning, especially with the AI hype. Because Python has a rich set of libraries for data manipulation and analysis, such as Pandas. To bridge this gap, nowadays you can find a set of tools and technologies available for real-time data processing in Python such as wrapper Python APIs/libraries for (JVM). However, In all Kafka wrappers, you can not simulate easily a production environment without a complex initial setup like creating computing clusters and managing partitions, shards, and workers' setups.
They need to implement a custom transformation user-defined function (UDF) to convert lets say most famous library Pandas transformation to Java syntax. This translation time can significantly impact the throughput and responsiveness of real-time applications.
Enumerate comparisons.
Getting a similar PyFlink based pipeline in production takes 6-12 months and involves several tools to use. GlassFlow can get your data pipeline up and running in just 15 minutes with single tool.
--
Anyone who agrees with this pull request could submit an Approve review to it.
@MatteoGuadrini @Wisma-55 @PythonChicken123 Could you help me to review and approve this PR, please? Thanks!
@Wisma-55 Thanks! Do you know who can merge the PR here?:)
Approved these changes
@Wisma-55 No worries, can we merge the PR?