NVIDIA-Optical-Character-Detection-and-Recognition-Solution
NVIDIA-Optical-Character-Detection-and-Recognition-Solution copied to clipboard
nvOCDR deployment with Triton on Jetson
Hello! I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.
-
GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.
I0424 01:24:25.977171 295 python_be.cc:2055] TRITONBACKEND_ModelInstanceInitialize: nvOCDR (CPU device 0)
I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with
CPU device 0
needs to be investigated further? Posting a few references below - a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub b. Does Python backend in Triton Server for Jetson supports GPU? c. Input tensor device placement - Triton -
Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?
root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001 /usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc` warnings.warn( [nvOCDR] Find total 2 images in /data/images/test_img/ Initializing CUDA NvMMLiteBlockCreate : Block : BlockType = 256 [JPEG Decode] BeginSequence Display WidthxHeight 1118x1063 NvMMLiteBlockCreate : Block : BlockType = 1 [nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3) Traceback (most recent call last): File "client.py", line 147, in <module> results = triton_client.infer(model_name=args.model_name, File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer raise_error_grpc(rpc_error) File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle At: /usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call /opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize /opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess /opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute [JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done [JPEG Decode] NvMMLiteJPEGDecBlockClose done
-
Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?
@morganh-nv @Bin-NV to check TritonServer issue
We verify on dgpu machines only. You can refer to dockerfile.
Not actively working on this issue. Will close for now. Thank you.