Creat Gpu TFTensor from Cuda array on GPU to avoid the deviceTohost copy.
Is your feature request related to a problem? Please describe. TFTensor object usually obtained from a CPU array meaning a need of data copy from a device(GPU) to a host (CPU). This pipleine architecture is considerably slow for a large dataset (e.g. large images).
Describe the solution you'd like
My pipeline considers the image processing via cuda(managedCuda wrapper) and its libraries (npp). At some point i would like to feed my CNN with an image stored on GPU as npp Image or just cuda array or just device pointer - call this d_array for the sake of convenience. Of course, one can copy it to the host to get a standard host
d_array.CopytoHost(h_array);
and then define the usual
var tensor = new TFTensor (h_array);
Is there and option to get the device tensor d_tensor from d_array directly avoiding CopytoHost operation and feed it to CNN?