streamz
streamz copied to clipboard
Real-time stream processing for python
I have some optimized Kafka code that I need to run which consumes data from Kafka with C++ and then creates GPU DataFrames (also C++ code) without having to pass...
In my current use case I spend a lot of time capturing images from IP cameras and processing those images in pipelines. Think home network security NVR with lots of...
The stateful [actors of Dask](https://distributed.dask.org/en/latest/actors.html) closely match the design of streamz nodes. If checkpoint/restart is part of the actor code, then a major deficit of the current actors model (loss...
I'm testing `to_kafka` sink and its throughput is limited by polltime (0.2 sec). Looks like `self.producer.poll(0)` only polls for one message at a time and so only one callback is...
Hello, I just started using your package and I find it really great! However I cannot find a direct way of computing a custom aggregation on a GroupBy dataframe, something...
Currently, we must `gather()` all the results from a Dask stream back to the master script and then push the results to Kafka. This removes all the benefits of parallel...
cover tests
Automated TTL (Time-To-Live) will help a lot in streaming pipelines to expire SDF records based on a field to avoid growing the SDF forever. Common use case is expire based...
This is a feature that exists in other streaming data system with windowing and aggregation functions. Its purpose is to support late arriving data in a window or aggregation. It...
It would be great to have some benchmarks for streamz so we can understand our performance. Th custreamz group has done some awesome work on this front. @chinmaychandak any interest...