streamz icon indicating copy to clipboard operation
streamz copied to clipboard

[Question] What is the best way to parallelize on the graph level

Open CJ-Wright opened this issue 7 years ago • 4 comments

The dask extensions have given us the ability to parallelize on the "inside a node" level. However, some nodes can be run completely independently from one another. What is the best way to access that level of parallelizem?

CJ-Wright avatar Aug 30 '17 17:08 CJ-Wright

I'll go ahead and claim that the current approach exposes all available parallelism. We just emit tasks and the scheduler manages them as appropriate.

mrocklin avatar Aug 30 '17 17:08 mrocklin

Cool. I guess I need to start playing around more with the dask stuff.

CJ-Wright avatar Aug 30 '17 17:08 CJ-Wright

I'm confident that it will break on you in many obvious and subtle ways. But in theory I think that it solves your problem just fine :)

On Wed, Aug 30, 2017 at 1:29 PM, Christopher J. Wright < [email protected]> wrote:

Cool. I guess I need to start playing around more with the dask stuff.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/58#issuecomment-326062550, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKzYMITdIOnixiiw9h0Ijq64bWt6ks5sdZv-gaJpZM4PHtjG .

mrocklin avatar Aug 30 '17 17:08 mrocklin

yeah, we've played with @mrocklin 's dask extension, passing futures. Seems to work well but we haven't rigorously tested yet. (but things like saving Futures to list and testing repeated computations are cached works very well. including normalize_token for arbitrary objects. so far so good for scientific computations!)

jrmlhermitte avatar Sep 23 '17 22:09 jrmlhermitte