Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Add streaming API

Open jpsamaroo opened this issue 1 year ago • 2 comments

Adds a spawn_streaming task queue to transform tasks into continuously-executing equivalents that automatically take from inputs streams/channels and put their result to an output stream/channel. Useful for processing tons of individual elements of some large (or infinite) collection.

Todo:

  • [x] Migrate streams on first use
  • [x] Add per-task input buffering to Stream object
  • [x] Add no-allocation ring buffer for process-local put/take to Stream
  • [x] Make buffering amount configurable
  • [x] Add API for constructing streams based on inferred return type, desired buffer size, and source/destination
  • [x] Allow finish_stream(xyz; return=abc) to return custom value (else nothing)
  • [x] Upstream MemPool migration changes (https://github.com/JuliaData/MemPool.jl/pull/80)
  • [x] Add docs
  • [x] Add tests
  • [ ] (Optional) Adapt ring buffer to support server-local put/take (use mmap?)
  • [ ] (Optional) Make value fetching configurable
  • [ ] (Optional) Support a waitany-style input stream, taking inputs from multiple tasks
  • [x] (Optional) take! from input streams concurrently, and waitall on them before continuing
  • [x] (Optional) put! into output streams concurrently, and waitall on them before continuing
  • [ ] (Optional) Allow using default or previously-cached value if sender not ready
  • [ ] (Optional) Allow dropping stream values (after timeout, receiver not ready, over-pressured, etc.)
  • [x] (Optional) Add utility for tracking stream transfer rates (https://github.com/JuliaParallel/Dagger.jl/pull/494)
  • [ ] (Optional) Add programmable behavior on upstream/downstream Stream closure (how should errors/finishing propagate?)

jpsamaroo avatar Dec 21 '23 15:12 jpsamaroo

Am I correct in thinking that all the necessary items except for tests are complete?

JamesWrigley avatar Mar 29 '24 16:03 JamesWrigley

Generally yes, I think we're pretty close to this being merge-ready. There are some remaining TODOs that I need to finish, but most are reasonably small. I could definitely use help with writing tests - just validating that we can run various kinds of pipelines and that they work across multiple workers would be really useful.

jpsamaroo avatar Mar 31 '24 16:03 jpsamaroo

Thanks so much to @JamesWrigley and @davidizzle for making this a reality! :heart:

jpsamaroo avatar Dec 09 '24 21:12 jpsamaroo