streamz icon indicating copy to clipboard operation
streamz copied to clipboard

ENH: DaskStream from set of futures

Open martindurant opened this issue 6 years ago • 4 comments

Suggest something like the equivalent of dask's as_completed, which emits whenever one of the futures contained changes state, or on a regular frequency with the current status of all futures. This could allow

  • flexible versions of the dask progress bar (i.e., we care about how many futures are in what state)
  • the stream-wise accumulation of futures results, e.g., we are calculating a grand histogram over many datasets, and would like to visualise the histogram so far, as it builds.

martindurant avatar Apr 05 '18 14:04 martindurant

Adding an as_completed method to DaskStream that collects futures greedily and emits them as they complete seems sensible to me.

As I write that I become somewhat concerned about the greedily modifier. This might violate back pressure. It might be important that an as_completed node has an optional maximum size, similar to buffer

mrocklin avatar Apr 05 '18 14:04 mrocklin

I was thinking that this idea could be a Source of DaskStreams, i.e, you give it a bunch of futures from other dask calculations not necessarily originating in streamz. Those would likely self-limit the total number of items available. Your interpretation is also useful, although would overlap with gather.

martindurant avatar Apr 05 '18 14:04 martindurant

Your interpretation is also useful, although would overlap with gather.

To be clear, Dask's as_completed does not gather futures, it merely reorders them (docs)

In that respect these seem nicely orthogonal to me.

Another orthogonal element would be dumping an iterable of things into a stream. That seems orthogonal to the as_completed behavior though.

mrocklin avatar Apr 05 '18 14:04 mrocklin

One could imagine a version of buffer which emits data as soon as the future is finished. This would break the ordering but could be ok for some applications.

CJ-Wright avatar Aug 31 '19 18:08 CJ-Wright