streamz
streamz copied to clipboard
[Question] What is the best way to parallelize on the graph level
The dask extensions have given us the ability to parallelize on the "inside a node" level. However, some nodes can be run completely independently from one another. What is the best way to access that level of parallelizem?
I'll go ahead and claim that the current approach exposes all available parallelism. We just emit tasks and the scheduler manages them as appropriate.
Cool. I guess I need to start playing around more with the dask stuff.
I'm confident that it will break on you in many obvious and subtle ways. But in theory I think that it solves your problem just fine :)
On Wed, Aug 30, 2017 at 1:29 PM, Christopher J. Wright < [email protected]> wrote:
Cool. I guess I need to start playing around more with the dask stuff.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mrocklin/streamz/issues/58#issuecomment-326062550, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKzYMITdIOnixiiw9h0Ijq64bWt6ks5sdZv-gaJpZM4PHtjG .
yeah, we've played with @mrocklin 's dask extension, passing futures. Seems to work well but we haven't rigorously tested yet. (but things like saving Future
s to list and testing repeated computations are cached works very well. including normalize_token
for arbitrary objects. so far so good for scientific computations!)