Dagger.jl
Dagger.jl copied to clipboard
Add streaming API
Adds a spawn_streaming task queue to transform tasks into continuously-executing equivalents that automatically take from inputs streams/channels and put their result to an output stream/channel. Useful for processing tons of individual elements of some large (or infinite) collection.
Todo:
- [x] Migrate streams on first use
- [x] Add per-task input buffering to
Streamobject - [x] Add no-allocation ring buffer for process-local put/take to
Stream - [x] Make buffering amount configurable
- [x] Add API for constructing streams based on inferred return type, desired buffer size, and source/destination
- [x] Allow
finish_stream(xyz; return=abc)to return custom value (elsenothing) - [x] Upstream MemPool migration changes (https://github.com/JuliaData/MemPool.jl/pull/80)
- [x] Add docs
- [x] Add tests
- [ ] (Optional) Adapt ring buffer to support server-local put/take (use
mmap?) - [ ] (Optional) Make value fetching configurable
- [ ] (Optional) Support a
waitany-style input stream, taking inputs from multiple tasks - [x] (Optional)
take!from input streams concurrently, andwaitallon them before continuing - [x] (Optional)
put!into output streams concurrently, andwaitallon them before continuing - [ ] (Optional) Allow using default or previously-cached value if sender not ready
- [ ] (Optional) Allow dropping stream values (after timeout, receiver not ready, over-pressured, etc.)
- [x] (Optional) Add utility for tracking stream transfer rates (https://github.com/JuliaParallel/Dagger.jl/pull/494)
- [ ] (Optional) Add programmable behavior on upstream/downstream Stream closure (how should errors/finishing propagate?)
Am I correct in thinking that all the necessary items except for tests are complete?
Generally yes, I think we're pretty close to this being merge-ready. There are some remaining TODOs that I need to finish, but most are reasonably small. I could definitely use help with writing tests - just validating that we can run various kinds of pipelines and that they work across multiple workers would be really useful.
Thanks so much to @JamesWrigley and @davidizzle for making this a reality! :heart: