Creat Gpu TFTensor from Cuda array on GPU to avoid the deviceTohost copy.

Open serjl opened this issue 5 years ago • 0 comments

Is your feature request related to a problem? Please describe. TFTensor object usually obtained from a CPU array meaning a need of data copy from a device(GPU) to a host (CPU). This pipleine architecture is considerably slow for a large dataset (e.g. large images).

Describe the solution you'd like My pipeline considers the image processing via cuda(managedCuda wrapper) and its libraries (npp). At some point i would like to feed my CNN with an image stored on GPU as npp Image or just cuda array or just device pointer - call this d_array for the sake of convenience. Of course, one can copy it to the host to get a standard host

d_array.CopytoHost(h_array);

and then define the usual

var tensor = new TFTensor (h_array);

Is there and option to get the device tensor d_tensor from d_array directly avoiding CopytoHost operation and feed it to CNN?

Mar 02 '20 12:03 serjl