Support CUDA streams
PyTorch includes CUDA streams, which let multiple GPU requests run in parallel.
However it appears that TorchSharp does not support CUDA streams. I searched the codebase and can't find anything like PyTorch's torch.cuda.Stream class, or C# wrappers for e.g. the wait_stream(), default_stream() and record_stream() methods.
hey @medovina , thanks for the heads up.
It seems currently we're missing this implementation, so I'm adding missing feature tag here for the implementation.
I've checked PyTorch wrapper for the libtorch, it is mostly depending on CUDA's API calls.
stream.py
Stream.cpp
We'll consider this in the future versions.
Great, thanks for considering this. Streams can be pretty important for good performance when performing inference on multiple threads, so I'd be very happy to see them supported in TorchSharp.