candle
candle copied to clipboard
Support for CUDA Streams
I'm looking to leverage more of my GPU when running multiple models in parallel. It'd be great if candle had some sort of support for running multiple concurrent streams at once, whether through changing the stream used internally to CUDA's per-thread default stream, or allowing the user to run closures in different streams (with_stream(|| { })), or something else.
Here's a discussion I've opened for it on cudarc: https://github.com/coreylowman/cudarc/issues/209
I was also looking into this, looks like cudarc now supports create a device_with_stream have you tested this yet @michaeleisel ?
I haven't, but it appears sufficient
Indeed this seems to be sufficient as all cudarc operations now use the appropriate stream based on the cudarc::driver::CudaDevice so I've just merged #2532 which adds a Device::new_cuda_with_stream based on this.