NVIDIA-Optical-Character-Detection-and-Recognition-Solution nvOCDR deployment with Triton on Jetson

Hello! I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.

GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.
```
I0424 01:24:25.977171 295 python_be.cc:2055] TRITONBACKEND_ModelInstanceInitialize: nvOCDR (CPU device 0)
```
I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with CPU device 0 needs to be investigated further? Posting a few references below - a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub b. Does Python backend in Triton Server for Jetson supports GPU? c. Input tensor device placement - Triton

Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?

root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001
/usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc`
  warnings.warn(
[nvOCDR] Find total 2 images in /data/images/test_img/
Initializing CUDA
NvMMLiteBlockCreate : Block : BlockType = 256 
[JPEG Decode] BeginSequence Display WidthxHeight 1118x1063
NvMMLiteBlockCreate : Block : BlockType = 1 
[nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3)
Traceback (most recent call last):
  File "client.py", line 147, in <module>
    results = triton_client.infer(model_name=args.model_name,
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle

At:
  /usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call
  /opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize
  /opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess
  /opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute

[JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
[JPEG Decode] NvMMLiteJPEGDecBlockClose done

Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?

Apr 26 '24 23:04 abhay-iy97

@morganh-nv @Bin-NV to check TritonServer issue

Apr 28 '24 00:04 Tyler-D

We verify on dgpu machines only. You can refer to dockerfile.

Apr 29 '24 09:04 morganh-nv

Not actively working on this issue. Will close for now. Thank you.

Jun 12 '24 17:06 abhay-iy97

NVIDIA-Optical-Character-Detection-and-Recognition-Solution NVIDIA-Optical-Character-Detection-and-Recognition-Solution copied to clipboard

nvOCDR deployment with Triton on Jetson

NVIDIA-Optical-Character-Detection-and-Recognition-Solution
NVIDIA-Optical-Character-Detection-and-Recognition-Solution copied to clipboard