TensorFlowSharp
TensorFlowSharp copied to clipboard
Passing input from GPU
Is your feature request related to a problem? Please describe. When working with large datasets, it can be cumbersome to have to copy data between CPU and GPU, especially when working in a pipeline architecture.
Describe the solution you'd like I am working in a pipeline architecture where all intermediary data resides in the GPU. I am interested in passing data already in the GPU into a DL network with the output still residing in the GPU to be passed to the next element in a pipeline. Essentially I would like to pass a cuda array into the network, get an output and convert back into a cuda array. I’d like to do this without having to perform CPU/GPU copies. Is this possible at the moment?
Describe alternatives you've considered To my understanding PyTorch and PyCuda can do this, but I need to work within a C# environment.
Additional context Add any other context or screenshots about the feature request here.