isaac_ros_dnn_inference icon indicating copy to clipboard operation
isaac_ros_dnn_inference copied to clipboard

High Latency in TensorRT Node for Image Segmentation on Jetson Orin Nano 8GB

Open eterry-devops opened this issue 3 months ago • 0 comments

Issue Description I'm experiencing unexpectedly high latency when running image segmentation using Isaac ROS DNN Inference on a Jetson Orin Nano 8GB. The TensorRT node appears to be the primary bottleneck, with processing delays averaging ~240-260ms. Environment

Hardware: Jetson Orin Nano 8GB Model: PeopleSegNet (deployable_quantized_vanilla_unet_onnx_v2.0) Isaac ROS Version: 3.2 JetPack Version: 6.2 CUDA Version: 12.6

Command Used bashros2 launch isaac_ros_examples isaac_ros_examples.launch.py
launch_fragments:=zed_mono_rect,unet
engine_file_path:=${ISAAC_ROS_WS}/isaac_ros_assets/models/peoplesemsegnet/deployable_quantized_vanilla_unet_onnx_v2.0/1/model.plan
input_binding_names:=['input_1:0']
output_binding_names:=['argmax_1']
network_output_type:='argmax'
interface_specs_file:=${ISAAC_ROS_WS}/isaac_ros_assets/isaac_ros_unet/zed2_quickstart_interface_specs.json Performance Measurements TensorRT Node Input (/tensor_sub) ros2 topic delay /tensor_sub --window 100

Average delay: ~240-260ms Min: 108ms Max: 348ms Std dev: ~0.04s

TensorRT Node Output (/tensor_pub) ros2 topic delay /tensor_pub --window 100

Average delay: ~200-214ms Min: 123ms Max: 287ms Std dev: ~0.03s

Analysis The latency measurements show that:

Total pipeline latency is averaging 240-260ms The TensorRT node itself appears to be adding significant processing time There's considerable variance in processing times (std dev ~40ms on input)

Expected Behavior For a Jetson Orin Nano 8GB with a quantized UNet model, I would expect much lower latency, ideally in the range of 140-160ms for real-time performance from image-to-mask

Image

eterry-devops avatar Aug 25 '25 12:08 eterry-devops