tch-rs icon indicating copy to clipboard operation
tch-rs copied to clipboard

Cuda Stream Support

Open k21 opened this issue 3 years ago • 1 comments

Thank you for creating and maintaining this library, it's great being able to experiment with machine learning in Rust. I am looking for potential performance improvements of some code that uses it.

Based on the CUDA streams section at https://pytorch.org/docs/stable/notes/cuda.html, it is my understanding that using streams is necessary to allow multiple operations to execute concurrently on a single GPU (one workaround could be to run operations from different processes, but that also has overhead).

Here are some sources I found that describe how streams can be used with the C++ API:

  • https://discuss.pytorch.org/t/cuda-streams-in-c-lib/38217
  • https://github.com/pytorch/pytorch/issues/16614#issuecomment-461160875

Would it be possible to add support for CUDA streams to tch?

k21 avatar Jul 28 '20 06:07 k21

Related C++ struct: at::cuda::CUDAStreamGuard and method: c10::cuda::setCurrentCUDAStream.

NOBLES5E avatar Nov 07 '20 21:11 NOBLES5E

@LaurentMazare Right now I'm loading your great rust-bert crate to run a model. I'd like to have a bunch of Rust threads do operations on the same model but it seems that I gradually run out of memory. I could wrap the model (and therefor all of torch) in a mutex, but then I can't saturate the GPU's throughput. I suspect that if I put every rust thread in its own Stream it would work. Do you agree with that?

If so, what are the steps for getting this feature merged? I could start by actually testing it!

njaard avatar Nov 17 '22 21:11 njaard