server icon indicating copy to clipboard operation
server copied to clipboard

[Python Backend] Send PbTensor to cpu for calling as_numpy() or add a function as_cupy()

Open zzk0 opened this issue 3 years ago • 12 comments

Is your feature request related to a problem? Please describe. In Python Backend, I send inference request to served models and get the inference response. What I need is a numpy array, so I call as_numpy(). But some error occurs, Tensor is stored in GPU and cannot be converted to NumPy.

Describe the solution you'd like

  1. send Tensor to cpu, so as_numpy() can be called
  2. add a function as_cupy(), so the result can be used.

zzk0 avatar Nov 09 '21 08:11 zzk0

Here is the code in my python backend:

The Tensor returned by get_output_tensor_by_name cannot call the method to_numpy, because the Tensor is stored in GPU. It seems that no other methods can get the data from Tensor.

        inference_request = pb_utils.InferenceRequest(
            model_name='rnet',
            requested_output_names=[self.Rnet_outputs[0], self.Rnet_outputs[1]],
            inputs=[pb_utils.Tensor(self.Rnet_inputs[0], predict_24_batch)]
        )
        inference_response = inference_request.exec()
        cls_prob = pb_utils.get_output_tensor_by_name(inference_response, self.Rnet_outputs[0]).as_numpy()
        roi_prob = pb_utils.get_output_tensor_by_name(inference_response, self.Rnet_outputs[1]).as_numpy()

zzk0 avatar Nov 09 '21 08:11 zzk0

By the way, How to debug a python backend? Anyway, Thanks for your help.

zzk0 avatar Nov 09 '21 09:11 zzk0

Have you read the section here.

Can you try the following? https://github.com/triton-inference-server/server/blob/main/qa/python_models/dlpack_io_identity/model.py#L85-L91

tanmayv25 avatar Nov 11 '21 18:11 tanmayv25

Yes I tried. But it has to convert to pytorch tensor, then convert to numpy array.

I wrote a function like this:

def pb_tensor_to_numpy(pb_tensor):
    if pb_tensor.is_cpu():
        return pb_tensor.as_numpy()
    else:
        pytorch_tensor = from_dlpack(pb_tensor.to_dlpack())
        return pytorch_tensor.cpu().numpy()

It seems the Triton doesn't provide any method to get numpy array from GPU. Anyway, thanks for your advice.

zzk0 avatar Nov 14 '21 14:11 zzk0

It is not possible to copy the tensor from GPU to CPU in Python backend directly. You need to use Pytorch (or any other framework that supports DLPack) to perform the conversion.

Edit: This feature makes sense and we've put it on the road map.

Tabrizian avatar Nov 23 '21 02:11 Tabrizian

Pytorch is not available with Pyton backend.

manastahir avatar Jan 16 '23 10:01 manastahir

@manastahir You can pip install pytorch in the container and the backend process should be able to access the module.

tanmayv25 avatar Jan 24 '23 22:01 tanmayv25

@tanmayv25 For deployments where we don't have direct access to containers, anyway to add it to container start command ?

manastahir avatar Jan 24 '23 22:01 manastahir

You would have to capture the dependency in a custom execution environment as described here.

tanmayv25 avatar Jan 24 '23 22:01 tanmayv25

We don't want to add pytorch to the package since it's too heavyweight, and importing pytorch consumes some memory. A more lightweight solution to copy to CPU like as_numpy() would work better for us.

ShuaiShao93 avatar Apr 17 '24 17:04 ShuaiShao93

@ShuaiShao93 I understood your use-case and I have updated my comment above.

Tabrizian avatar Apr 17 '24 20:04 Tabrizian

@ShuaiShao93 I understood your use-case and I have updated my comment above.

Thank you! Please update here when it's done. Really appreciate!

ShuaiShao93 avatar Apr 17 '24 21:04 ShuaiShao93