Dagger.jl Add streaming API

Adds a spawn_streaming task queue to transform tasks into continuously-executing equivalents that automatically take from inputs streams/channels and put their result to an output stream/channel. Useful for processing tons of individual elements of some large (or infinite) collection.

Todo:

[x] Migrate streams on first use
[x] Add per-task input buffering to Stream object
[x] Add no-allocation ring buffer for process-local put/take to Stream
[x] Make buffering amount configurable
[x] Add API for constructing streams based on inferred return type, desired buffer size, and source/destination
[x] Allow finish_stream(xyz; return=abc) to return custom value (else nothing)
[x] Upstream MemPool migration changes (https://github.com/JuliaData/MemPool.jl/pull/80)
[x] Add docs
[x] Add tests
[ ] (Optional) Adapt ring buffer to support server-local put/take (use mmap?)
[ ] (Optional) Make value fetching configurable
[ ] (Optional) Support a waitany-style input stream, taking inputs from multiple tasks
[x] (Optional) take! from input streams concurrently, and waitall on them before continuing
[x] (Optional) put! into output streams concurrently, and waitall on them before continuing
[ ] (Optional) Allow using default or previously-cached value if sender not ready
[ ] (Optional) Allow dropping stream values (after timeout, receiver not ready, over-pressured, etc.)
[x] (Optional) Add utility for tracking stream transfer rates (https://github.com/JuliaParallel/Dagger.jl/pull/494)
[ ] (Optional) Add programmable behavior on upstream/downstream Stream closure (how should errors/finishing propagate?)

Dec 21 '23 15:12 jpsamaroo

Am I correct in thinking that all the necessary items except for tests are complete?

Mar 29 '24 16:03 JamesWrigley

Generally yes, I think we're pretty close to this being merge-ready. There are some remaining TODOs that I need to finish, but most are reasonably small. I could definitely use help with writing tests - just validating that we can run various kinds of pipelines and that they work across multiple workers would be really useful.

Mar 31 '24 16:03 jpsamaroo

Thanks so much to @JamesWrigley and @davidizzle for making this a reality! :heart:

Dec 09 '24 21:12 jpsamaroo

Dagger.jl Dagger.jl copied to clipboard

Add streaming API

Dagger.jl
Dagger.jl copied to clipboard