candle icon indicating copy to clipboard operation
candle copied to clipboard

Support for CUDA Streams

Open michaeleisel opened this issue 1 year ago • 4 comments

I'm looking to leverage more of my GPU when running multiple models in parallel. It'd be great if candle had some sort of support for running multiple concurrent streams at once, whether through changing the stream used internally to CUDA's per-thread default stream, or allowing the user to run closures in different streams (with_stream(|| { })), or something else.

michaeleisel avatar Feb 23 '24 20:02 michaeleisel

Here's a discussion I've opened for it on cudarc: https://github.com/coreylowman/cudarc/issues/209

michaeleisel avatar Mar 15 '24 13:03 michaeleisel

I was also looking into this, looks like cudarc now supports create a device_with_stream have you tested this yet @michaeleisel ?

xnorpx avatar Aug 11 '24 03:08 xnorpx

I haven't, but it appears sufficient

michaeleisel avatar Aug 11 '24 06:08 michaeleisel

Indeed this seems to be sufficient as all cudarc operations now use the appropriate stream based on the cudarc::driver::CudaDevice so I've just merged #2532 which adds a Device::new_cuda_with_stream based on this.

LaurentMazare avatar Oct 02 '24 19:10 LaurentMazare