isaac_ros_dnn_inference
isaac_ros_dnn_inference copied to clipboard
High Latency in TensorRT Node for Image Segmentation on Jetson Orin Nano 8GB
Issue Description I'm experiencing unexpectedly high latency when running image segmentation using Isaac ROS DNN Inference on a Jetson Orin Nano 8GB. The TensorRT node appears to be the primary bottleneck, with processing delays averaging ~240-260ms. Environment
Hardware: Jetson Orin Nano 8GB Model: PeopleSegNet (deployable_quantized_vanilla_unet_onnx_v2.0) Isaac ROS Version: 3.2 JetPack Version: 6.2 CUDA Version: 12.6
Command Used
bashros2 launch isaac_ros_examples isaac_ros_examples.launch.py
launch_fragments:=zed_mono_rect,unet
engine_file_path:=${ISAAC_ROS_WS}/isaac_ros_assets/models/peoplesemsegnet/deployable_quantized_vanilla_unet_onnx_v2.0/1/model.plan
input_binding_names:=['input_1:0']
output_binding_names:=['argmax_1']
network_output_type:='argmax'
interface_specs_file:=${ISAAC_ROS_WS}/isaac_ros_assets/isaac_ros_unet/zed2_quickstart_interface_specs.json
Performance Measurements
TensorRT Node Input (/tensor_sub)
ros2 topic delay /tensor_sub --window 100
Average delay: ~240-260ms Min: 108ms Max: 348ms Std dev: ~0.04s
TensorRT Node Output (/tensor_pub) ros2 topic delay /tensor_pub --window 100
Average delay: ~200-214ms Min: 123ms Max: 287ms Std dev: ~0.03s
Analysis The latency measurements show that:
Total pipeline latency is averaging 240-260ms The TensorRT node itself appears to be adding significant processing time There's considerable variance in processing times (std dev ~40ms on input)
Expected Behavior For a Jetson Orin Nano 8GB with a quantized UNet model, I would expect much lower latency, ideally in the range of 140-160ms for real-time performance from image-to-mask