candle Support for CUDA Streams

Support for CUDA Streams

Open michaeleisel opened this issue 1 year ago • 4 comments

I'm looking to leverage more of my GPU when running multiple models in parallel. It'd be great if candle had some sort of support for running multiple concurrent streams at once, whether through changing the stream used internally to CUDA's per-thread default stream, or allowing the user to run closures in different streams (with_stream(|| { })), or something else.

Feb 23 '24 20:02 michaeleisel

Here's a discussion I've opened for it on cudarc: https://github.com/coreylowman/cudarc/issues/209

Mar 15 '24 13:03 michaeleisel

I was also looking into this, looks like cudarc now supports create a device_with_stream have you tested this yet @michaeleisel ?

Aug 11 '24 03:08 xnorpx

I haven't, but it appears sufficient

Aug 11 '24 06:08 michaeleisel

Indeed this seems to be sufficient as all cudarc operations now use the appropriate stream based on the cudarc::driver::CudaDevice so I've just merged #2532 which adds a Device::new_cuda_with_stream based on this.

Oct 02 '24 19:10 LaurentMazare

candle candle copied to clipboard

Support for CUDA Streams

candle
candle copied to clipboard