streamz
streamz copied to clipboard
ENH: DaskStream from set of futures
Suggest something like the equivalent of dask's as_completed
, which emits whenever one of the futures contained changes state, or on a regular frequency with the current status of all futures. This could allow
- flexible versions of the dask progress bar (i.e., we care about how many futures are in what state)
- the stream-wise accumulation of futures results, e.g., we are calculating a grand histogram over many datasets, and would like to visualise the histogram so far, as it builds.
Adding an as_completed method to DaskStream that collects futures greedily and emits them as they complete seems sensible to me.
As I write that I become somewhat concerned about the greedily modifier. This might violate back pressure. It might be important that an as_completed node has an optional maximum size, similar to buffer
I was thinking that this idea could be a Source of DaskStreams, i.e, you give it a bunch of futures from other dask calculations not necessarily originating in streamz. Those would likely self-limit the total number of items available.
Your interpretation is also useful, although would overlap with gather
.
Your interpretation is also useful, although would overlap with gather.
To be clear, Dask's as_completed does not gather futures, it merely reorders them (docs)
In that respect these seem nicely orthogonal to me.
Another orthogonal element would be dumping an iterable of things into a stream. That seems orthogonal to the as_completed behavior though.
One could imagine a version of buffer
which emits data as soon as the future is finished. This would break the ordering but could be ok for some applications.