tch-rs Preliminary support for stream guard.

trafficstars

This adds some basic support for cuda streams #234. Note that this has not been tested yet and is even unlikely to compile as I don't have access to a cuda enabled box at the moment.

Usage would be as follow.

let stream = tch::CudaStream::from_pool(/*high_priority=*/false, /*device_idx=*/1);
let _guard = tch::CudaStreamGuard::new(stream);
// The operations should here and until the end of the scope should take place in the created stream.

Apr 06 '21 19:04 LaurentMazare

There are still some difficulties to get this merged.

The PyTorch C++ headers this require have references to the standard cuda headers.This PyTorch issue has been opened to see if we can get around this. In the meantime adding some .include("/usr/lib/cuda/include") in torch-sys/build.rs might help.
This may require linking with c10_cuda, e.g. by adding the following line to torch-sys/build.rs.

println!("cargo:rustc-link-lib=c10_cuda");

Apr 07 '21 19:04 LaurentMazare

It may also be required to link cudart explicitely.

Apr 08 '21 18:04 LaurentMazare

Hey, curious if there is any movement on adding the stream api to tch-rs. I am working on an inference pipeline where having control over the cuda stream would be beneficial for eeking out some more performance.

Jul 10 '23 21:07 blogle

Not much progress I'm afraid. The change in this PR is pretty straightforward but the trickiness is that the required headers end up depending on the cuda headers and these headers may not be installed if cuda was vendored with pytorch rather than properly installed. The related PyTorch issue https://github.com/pytorch/pytorch/issues/55454 hasn't seen much activity, (also related is https://github.com/pytorch/pytorch/issues/47743 ). Having these headers available would unlock both support of cuda stream and cuda graph for tch so would be great to have but it's a bit stuck at the moment.

Jul 13 '23 07:07 LaurentMazare

Ah, I see. I myself am using the vendored cuda installation from pytorch as a means to reduce the size of our docker image. However we have had to patch the install by creating symlinks with the canonical shared library name's to get external cuda code to run (eg the cuda gstreamer elements). Another annoyance is that other cudatoolkit binaries like nvcc are not available - unless we install another full blown cudatoolkit into the image.

It seems there isn't much movement on the upstream project to expose the requisite headers here. Neither of these options are ideal, but what are your thoughts on a) vendoring the cuda headers yourself as part of tch-rs b) perhaps we could maintain a 3rd party build of libtorch that includes the full cudatoolkit - I have not investigated the level of effort here, but it would be advantageous for myself to have access to cuda headers and nvcc to compile custom kernels alongside our torch models.

Jul 20 '23 18:07 blogle

Vendoring the cuda headers might be a bit tricky. A specific version of tch is tied to a specific release of libtorch but these may used different versions of cuda (which I guess is pretty helpful to users with older gpus/nvidia drivers). No clue how much effort it would be to host fully self-contained binaries, but I guess it might be non-trivial as these would have to run in a wide variety of environments - maybe there are some specific bits of the PyTorch ecosystem that could be leveraged for this e.g. pytorch/builder. Another possibility would be for tch to try and detect the cuda install and only activate the advanced cuda functionalities (cuda guards and all) if these are available.

Overall the best would certainly be if the libtorch binaries could embed a bit more of the header files but indeed there hasn't been much progress on this over the last year so we should probably find a good way around this.

Jul 21 '23 15:07 LaurentMazare

tch-rs tch-rs copied to clipboard

Preliminary support for stream guard.

tch-rs
tch-rs copied to clipboard