server icon indicating copy to clipboard operation
server copied to clipboard

Exception serializing request - dealing with large input

Open mhbassel opened this issue 11 months ago • 6 comments

Hey everyone!

I am trying to deploy whisper model. I used to the python backend for that. We pass the audio as numpy array to Triton server, I set the dims in the model config to -1 so the user can determine the input shape dynamically (whisper accepts that too).

When I try to inference a very long audio (around 24h) with shape of 1.382.393.499 (very large I know) I get this error:

Traceback (most recent call last):
  File "/media/ssdraid/training/triton/triton-test/whisper_client.py", line 145, in <module>
    results = triton_client.infer(
  File "/home/tensorflow/venvs/triton/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 1322, in infer
    raise_error_grpc(rpc_error)
  File "/home/tensorflow/venvs/triton/lib/python3.10/site-packages/tritonclient/grpc/__init__.py", line 62, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Exception serializing request!

Can anyone help me finding a solution, please?

Thanks in advance

mhbassel avatar Mar 06 '24 17:03 mhbassel