awesome-python add GlassFlow

What is this Python project?

GlassFlow is a serverless, Python-centric real-time data transformation solution for end-to-end data pipelines. If you use GlassFlow, you do not need Apache Kafka and Flink. Visit the docs page to learn more: https://docs.glassflow.dev/get-started/introduction

Describe features.

You can:

Use GlassFlow out-of-the-box with any existing Python library.
Start GlassFlow without a complex initial setup such as creating clusters.
Skip the headache of managing partitions, shards, and workers' setup.
Define your pipeline as code using GlassFlow CLI.
Implement your transformation function using GlassFlow Python SDK
Run your Python code locally for easy development and debugging.

GlassFlow does:

Provides a pure Python and zero infrastructure environment.
Keeps your original data where it is.
Connects live data sources.
Ingests real-time data continuously.
Does real-time data transformation.
Simulates your production workloads.
Deploys your pipeline to production within minutes.
Delivers auto-scalable serverless event streaming infrastructure.

What's the difference between this Python project and similar ones?

Most real-time data processing tools including Kafka are Java-based, while in recent days Python has been the go-to language for data science and machine learning, especially with the AI hype. Because Python has a rich set of libraries for data manipulation and analysis, such as Pandas. To bridge this gap, nowadays you can find a set of tools and technologies available for real-time data processing in Python such as wrapper Python APIs/libraries for (JVM). However, In all Kafka wrappers, you can not simulate easily a production environment without a complex initial setup like creating computing clusters and managing partitions, shards, and workers' setups.

They need to implement a custom transformation user-defined function (UDF) to convert lets say most famous library Pandas transformation to Java syntax. This translation time can significantly impact the throughput and responsiveness of real-time applications.

Enumerate comparisons.

Getting a similar PyFlink based pipeline in production takes 6-12 months and involves several tools to use. GlassFlow can get your data pipeline up and running in just 15 minutes with single tool.

--

Anyone who agrees with this pull request could submit an Approve review to it.

Sep 20 '24 16:09 Boburmirzo

@MatteoGuadrini @Wisma-55 @PythonChicken123 Could you help me to review and approve this PR, please? Thanks!

Sep 20 '24 16:09 Boburmirzo

@Wisma-55 Thanks! Do you know who can merge the PR here?:)

Sep 20 '24 16:09 Boburmirzo

Approved these changes

Oct 16 '24 20:10 Bib4real

@Wisma-55 No worries, can we merge the PR?

Mar 21 '25 17:03 Boburmirzo