server [Python Backend] Send PbTensor to cpu for calling as_numpy() or add a function as

Is your feature request related to a problem? Please describe. In Python Backend, I send inference request to served models and get the inference response. What I need is a numpy array, so I call as_numpy(). But some error occurs, Tensor is stored in GPU and cannot be converted to NumPy.

Describe the solution you'd like

send Tensor to cpu, so as_numpy() can be called
add a function as_cupy(), so the result can be used.

Nov 09 '21 08:11 zzk0

Here is the code in my python backend:

The Tensor returned by get_output_tensor_by_name cannot call the method to_numpy, because the Tensor is stored in GPU. It seems that no other methods can get the data from Tensor.

        inference_request = pb_utils.InferenceRequest(
            model_name='rnet',
            requested_output_names=[self.Rnet_outputs[0], self.Rnet_outputs[1]],
            inputs=[pb_utils.Tensor(self.Rnet_inputs[0], predict_24_batch)]
        )
        inference_response = inference_request.exec()
        cls_prob = pb_utils.get_output_tensor_by_name(inference_response, self.Rnet_outputs[0]).as_numpy()
        roi_prob = pb_utils.get_output_tensor_by_name(inference_response, self.Rnet_outputs[1]).as_numpy()

Nov 09 '21 08:11 zzk0

By the way, How to debug a python backend? Anyway, Thanks for your help.

Nov 09 '21 09:11 zzk0

Have you read the section here.

Can you try the following? https://github.com/triton-inference-server/server/blob/main/qa/python_models/dlpack_io_identity/model.py#L85-L91

Nov 11 '21 18:11 tanmayv25

Yes I tried. But it has to convert to pytorch tensor, then convert to numpy array.

I wrote a function like this:

def pb_tensor_to_numpy(pb_tensor):
    if pb_tensor.is_cpu():
        return pb_tensor.as_numpy()
    else:
        pytorch_tensor = from_dlpack(pb_tensor.to_dlpack())
        return pytorch_tensor.cpu().numpy()

It seems the Triton doesn't provide any method to get numpy array from GPU. Anyway, thanks for your advice.

Nov 14 '21 14:11 zzk0

It is not possible to copy the tensor from GPU to CPU in Python backend directly. You need to use Pytorch (or any other framework that supports DLPack) to perform the conversion.

Edit: This feature makes sense and we've put it on the road map.

Nov 23 '21 02:11 Tabrizian

Pytorch is not available with Pyton backend.

Jan 16 '23 10:01 manastahir

@manastahir You can pip install pytorch in the container and the backend process should be able to access the module.

Jan 24 '23 22:01 tanmayv25

@tanmayv25 For deployments where we don't have direct access to containers, anyway to add it to container start command ?

Jan 24 '23 22:01 manastahir

You would have to capture the dependency in a custom execution environment as described here.

Jan 24 '23 22:01 tanmayv25

We don't want to add pytorch to the package since it's too heavyweight, and importing pytorch consumes some memory. A more lightweight solution to copy to CPU like as_numpy() would work better for us.

Apr 17 '24 17:04 ShuaiShao93

@ShuaiShao93 I understood your use-case and I have updated my comment above.

Apr 17 '24 20:04 Tabrizian

@ShuaiShao93 I understood your use-case and I have updated my comment above.

Thank you! Please update here when it's done. Really appreciate!

Apr 17 '24 21:04 ShuaiShao93

server
server copied to clipboard

[Python Backend] Send PbTensor to cpu for calling as_numpy() or add a function as_cupy()

server server copied to clipboard

[Python Backend] Send PbTensor to cpu for calling as_numpy() or add a function as_cupy()

server
server copied to clipboard