NVIDIA-Optical-Character-Detection-and-Recognition-Solution icon indicating copy to clipboard operation
NVIDIA-Optical-Character-Detection-and-Recognition-Solution copied to clipboard

nvOCDR deployment with Triton on Jetson

Open abhay-iy97 opened this issue 10 months ago • 2 comments

Hello! I have a few questions regarding nvocdr model deployment via Triton on the Jetson orin nx (Jetpack 5.1.2). I have been following the information (here and here) on tao-5.0 branch to deploy the non-ViT based models. For more context, I have also been tracking this topic on the DeepStream forum here.

  1. GPU usage query - Currently, on launching the triton inference server with the nvocdr model python backend, I see the following log regarding the model initialization.

    I0424 01:24:25.977171 295 python_be.cc:2055] TRITONBACKEND_ModelInstanceInitialize: nvOCDR (CPU device 0)
    

    I see the warpInfer() / warpInferPatches() in pybind.cpp place data on the GPU from the host + gpu usage increasing during calls to the server. However, I wanted to confirm that nvocdr model is utilizing the GPU for inferencing on Jetson Orin NX with JP5.1.2 or whether the model initialization with CPU device 0 needs to be investigated further? Posting a few references below - a. How to serve Python models on GPU · Issue #5889 · triton-inference-server/server · GitHub b. Does Python backend in Triton Server for Jetson supports GPU? c. Input tensor device placement - Triton

  2. Regarding the usage of pynvjpeg, I get the following CUDA error from the server. Any insights on this?

    root@ubuntu:/enhancement# python3 client.py -d /data/images/test_img/ -bs 1 --url localhost:8001
    /usr/local/lib/python3.8/dist-packages/tritongrpcclient/__init__.py:33: DeprecationWarning: The package `tritongrpcclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.grpc`
      warnings.warn(
    [nvOCDR] Find total 2 images in /data/images/test_img/
    Initializing CUDA
    NvMMLiteBlockCreate : Block : BlockType = 256 
    [JPEG Decode] BeginSequence Display WidthxHeight 1118x1063
    NvMMLiteBlockCreate : Block : BlockType = 1 
    [nvOCDR] Processing for: /data/images/test_img/scene_text.jpg, image size: (1063, 1118, 3)
    Traceback (most recent call last):
      File "client.py", line 147, in <module>
        results = triton_client.infer(model_name=args.model_name,
      File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_client.py", line 1572, in infer
        raise_error_grpc(rpc_error)
      File "/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
        raise get_error_grpc(rpc_error) from None
    tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'nvOCDR', message: LogicError: cuFuncSetBlockShape failed: invalid resource handle
    
    At:
      /usr/local/lib/python3.8/dist-packages/pycuda/driver.py(481): function_call
      /opt/nvocdr/ocdr/triton/utils/cuda_resize_keep_AR.py(169): image_resize
      /opt/nvocdr/ocdr/triton/utils/process.py(87): preprocess
      /opt/nvocdr/ocdr/triton/models/nvOCDR/1/model.py(160): execute
    
    [JPEG Decode] NvMMLiteJPEGDecBlockPrivateClose done
    [JPEG Decode] NvMMLiteJPEGDecBlockClose done
    
  3. Inference speed - Inferencing without pynvjpeg works fine however the inference time per file is usually above 100 - 200ms (this is printed on the server) by nvocdr itself. Image sizes varies between ~300x300 to 1200x1000. Is this inference time expected?
    d478fc491520058ceec726461c4d08967404dd8f

abhay-iy97 avatar Apr 26 '24 23:04 abhay-iy97

@morganh-nv @Bin-NV to check TritonServer issue

Tyler-D avatar Apr 28 '24 00:04 Tyler-D

We verify on dgpu machines only. You can refer to dockerfile.

morganh-nv avatar Apr 29 '24 09:04 morganh-nv

Not actively working on this issue. Will close for now. Thank you.

abhay-iy97 avatar Jun 12 '24 17:06 abhay-iy97