streamz
streamz copied to clipboard
ENH: DaskStream from set of futures
Suggest something like the equivalent of dask's as_completed, which emits whenever one of the futures contained changes state, or on a regular frequency with the current status of all futures. This could allow
- flexible versions of the dask progress bar (i.e., we care about how many futures are in what state)
- the stream-wise accumulation of futures results, e.g., we are calculating a grand histogram over many datasets, and would like to visualise the histogram so far, as it builds.
Adding an as_completed method to DaskStream that collects futures greedily and emits them as they complete seems sensible to me.
As I write that I become somewhat concerned about the greedily modifier. This might violate back pressure. It might be important that an as_completed node has an optional maximum size, similar to buffer
I was thinking that this idea could be a Source of DaskStreams, i.e, you give it a bunch of futures from other dask calculations not necessarily originating in streamz. Those would likely self-limit the total number of items available.
Your interpretation is also useful, although would overlap with gather.
Your interpretation is also useful, although would overlap with gather.
To be clear, Dask's as_completed does not gather futures, it merely reorders them (docs)
In that respect these seem nicely orthogonal to me.
Another orthogonal element would be dumping an iterable of things into a stream. That seems orthogonal to the as_completed behavior though.
One could imagine a version of buffer which emits data as soon as the future is finished. This would break the ordering but could be ok for some applications.