Gradient icon indicating copy to clipboard operation
Gradient copied to clipboard

Passing Input Data from GPU in Deployment

Open solarflarefx opened this issue 5 years ago • 3 comments

Hello,

In most applications (at least from my experience) source data is usually coming from the CPU, whether that be in training or deployment. Many times, developers will train a model utilizing the GPU for training and then may or may not use the GPU in deployment, depending on the application and needs. However, even when the GPU is used in deployment, from what I have seen the data source to the model is usually stored in disk, manipulated (perhaps a formatting process), and then passed through the model.

In my case, I am working within a pipeline architecture where the source data is already in the GPU and I need to pass this data through a model. In particular, the pipeline is written in C# and I have a trained model from TensorFlow/Keras (that I trained in Python). As of now, the only way I know of using the model is to pass the data from the GPU to the CPU and then pass the data into the model (and perhaps then get the data from the CPU and pass it to the GPU again for use in the pipeline). The is seemingly very inefficient (my datasets are pretty large and speed of the overall pipeline is important). I was wondering if it is possible to pass data to the learned model directly from the GPU and then continue passing data through the pipeline without having to do copies to/from the CPU. I am wondering if the data pipelines in this TensorFlow binding would allow for something like this. Or if there is another approach, I am open to that as well.

solarflarefx avatar Nov 11 '19 00:11 solarflarefx

@solarflarefx to see if I can help you, I need to know how exactly is data currently stored in GPU? Is it a texture in DirectX/OpenGL? A CUDA buffer?

This is an important use case, will keep this issue open to track status.

lostmsu avatar Nov 12 '19 01:11 lostmsu

@lostmsu So as of now it's a ManagedCuda array. I am working with a pipeline where data is initially loaded from the CPU to GPU at the beginning, but once in the GPU it would be preferable to keep all data there until the very end. The other parts of the pipeline are able to do this, but I need to insert a trained machine learning model to run inference somewhere in the middle. And again for speed purposes I'd like to avoid GPU/CPU copies if possible. I'm not sure about the feasibility of this myself yet, but perhaps you have had some experience doing this.

If anything is unclear or you need more details please let me know.

solarflarefx avatar Nov 22 '19 18:11 solarflarefx

This is still not supported by TensorFlow directly: https://github.com/tensorflow/tensorflow/issues/29039

lostmsu avatar May 04 '21 03:05 lostmsu