awesome-python icon indicating copy to clipboard operation
awesome-python copied to clipboard

add GlassFlow

Open Boburmirzo opened this issue 1 year ago • 4 comments

What is this Python project?

GlassFlow is a serverless, Python-centric real-time data transformation solution for end-to-end data pipelines. If you use GlassFlow, you do not need Apache Kafka and Flink. Visit the docs page to learn more: https://docs.glassflow.dev/get-started/introduction

Describe features.

You can:

  • Use GlassFlow out-of-the-box with any existing Python library.
  • Start GlassFlow without a complex initial setup such as creating clusters.
  • Skip the headache of managing partitions, shards, and workers' setup.
  • Define your pipeline as code using GlassFlow CLI.
  • Implement your transformation function using GlassFlow Python SDK
  • Run your Python code locally for easy development and debugging.

GlassFlow does:

  • Provides a pure Python and zero infrastructure environment.
  • Keeps your original data where it is.
  • Connects live data sources.
  • Ingests real-time data continuously.
  • Does real-time data transformation.
  • Simulates your production workloads.
  • Deploys your pipeline to production within minutes.
  • Delivers auto-scalable serverless event streaming infrastructure.

What's the difference between this Python project and similar ones?

Most real-time data processing tools including Kafka are Java-based, while in recent days Python has been the go-to language for data science and machine learning, especially with the AI hype. Because Python has a rich set of libraries for data manipulation and analysis, such as Pandas. To bridge this gap, nowadays you can find a set of tools and technologies available for real-time data processing in Python such as wrapper Python APIs/libraries for (JVM). However, In all Kafka wrappers, you can not simulate easily a production environment without a complex initial setup like creating computing clusters and managing partitions, shards, and workers' setups.

They need to implement a custom transformation user-defined function (UDF) to convert lets say most famous library Pandas transformation to Java syntax. This translation time can significantly impact the throughput and responsiveness of real-time applications.

Enumerate comparisons.

Getting a similar PyFlink based pipeline in production takes 6-12 months and involves several tools to use. GlassFlow can get your data pipeline up and running in just 15 minutes with single tool.

--

Anyone who agrees with this pull request could submit an Approve review to it.

Boburmirzo avatar Sep 20 '24 16:09 Boburmirzo

@MatteoGuadrini @Wisma-55 @PythonChicken123 Could you help me to review and approve this PR, please? Thanks!

Boburmirzo avatar Sep 20 '24 16:09 Boburmirzo

@Wisma-55 Thanks! Do you know who can merge the PR here?:)

Boburmirzo avatar Sep 20 '24 16:09 Boburmirzo

Approved these changes

Bib4real avatar Oct 16 '24 20:10 Bib4real

@Wisma-55 No worries, can we merge the PR?

Boburmirzo avatar Mar 21 '25 17:03 Boburmirzo