tch-rs
tch-rs copied to clipboard
Preliminary support for stream guard.
This adds some basic support for cuda streams #234. Note that this has not been tested yet and is even unlikely to compile as I don't have access to a cuda enabled box at the moment.
Usage would be as follow.
let stream = tch::CudaStream::from_pool(/*high_priority=*/false, /*device_idx=*/1);
let _guard = tch::CudaStreamGuard::new(stream);
// The operations should here and until the end of the scope should take place in the created stream.
There are still some difficulties to get this merged.
- The PyTorch C++ headers this require have references to the standard cuda headers.This PyTorch issue has been opened to see if we can get around this. In the meantime adding some
.include("/usr/lib/cuda/include")intorch-sys/build.rsmight help. - This may require linking with c10_cuda, e.g. by adding the following line to
torch-sys/build.rs.
println!("cargo:rustc-link-lib=c10_cuda");
It may also be required to link cudart explicitely.
Hey, curious if there is any movement on adding the stream api to tch-rs. I am working on an inference pipeline where having control over the cuda stream would be beneficial for eeking out some more performance.
Not much progress I'm afraid. The change in this PR is pretty straightforward but the trickiness is that the required headers end up depending on the cuda headers and these headers may not be installed if cuda was vendored with pytorch rather than properly installed. The related PyTorch issue https://github.com/pytorch/pytorch/issues/55454 hasn't seen much activity, (also related is https://github.com/pytorch/pytorch/issues/47743 ).
Having these headers available would unlock both support of cuda stream and cuda graph for tch so would be great to have but it's a bit stuck at the moment.
Ah, I see. I myself am using the vendored cuda installation from pytorch as a means to reduce the size of our docker image. However we have had to patch the install by creating symlinks with the canonical shared library name's to get external cuda code to run (eg the cuda gstreamer elements). Another annoyance is that other cudatoolkit binaries like nvcc are not available - unless we install another full blown cudatoolkit into the image.
It seems there isn't much movement on the upstream project to expose the requisite headers here. Neither of these options are ideal, but what are your thoughts on a) vendoring the cuda headers yourself as part of tch-rs b) perhaps we could maintain a 3rd party build of libtorch that includes the full cudatoolkit - I have not investigated the level of effort here, but it would be advantageous for myself to have access to cuda headers and nvcc to compile custom kernels alongside our torch models.
Vendoring the cuda headers might be a bit tricky. A specific version of tch is tied to a specific release of libtorch but these may used different versions of cuda (which I guess is pretty helpful to users with older gpus/nvidia drivers).
No clue how much effort it would be to host fully self-contained binaries, but I guess it might be non-trivial as these would have to run in a wide variety of environments - maybe there are some specific bits of the PyTorch ecosystem that could be leveraged for this e.g. pytorch/builder.
Another possibility would be for tch to try and detect the cuda install and only activate the advanced cuda functionalities (cuda guards and all) if these are available.
Overall the best would certainly be if the libtorch binaries could embed a bit more of the header files but indeed there hasn't been much progress on this over the last year so we should probably find a good way around this.