TensorRT
TensorRT copied to clipboard
Internal Error (/lightglue/ArgMax)
When I run the model with TensorRT 8.6.2 and CUDA 12.2 in Jetson Device, there are a lot of errors: (Ubuntu 22.04, Jack Pack 6.0 R36.2 https://developer.nvidia.com/embedded/jetpack#collapseAllJetson )
[E] [TRT] 7: [shapeMachine.cpp::nvinfer1::rt::ShapeMachineRoutine::executeContinuation::864] Error Code 7: Internal Error (/lightglue/ArgMax: length of reduction axis ((MIN 2048 (# 0 (VALUE /extractor_1/NonZero_1[size])))) is smaller than K (1) Condition '==' violated: 0 != 1. Instruction: CHECK_EQUAL 0 1.)
There is no such issue when I run the model with TensorRT 8.6.1.6 in Windows.
The model is https://github.com/fabio-sim/LightGlue-ONNX/releases/download/v0.1.3/superpoint_2048_lightglue_end2end.onnx from https://github.com/fabio-sim/LightGlue-ONNX/releases/tag/v0.1.3
Thanks!
-Scott
I can see it has dynamic shapes input, what is the intended input shapes? Could you please a trtexec/polygraphy command that can reproduce the error? Thanks!
/usr/bin/trtexec --workspace=40960 --onnx=/home/dev/projects/models/superpoint_2048_lightglue_end2end_tensorrt.onnx --saveEngine=/home/dev/projects/models/superpoint_2048_lightglue_end2end_tensorrt_1x1x3840x640_1x1x3840x128.fp32.trt8.v8.6.2.3.sm87.engine --minShapes=image0:1x1x3840x640,image1:1x1x3840x128 --optShapes=image0:1x1x3840x640,image1:1x1x3840x128 --maxShapes=image0:1x1x3840x640,image1:1x1x3840x128
could you please provide a log with --verbose in jetson, cause I don't have a jetson on my hand.
BTW does it work with static shape? e.g. only specify --optShapes
minShapes , optShapes and maxShapes are all the same.
When set --verbose, this is the error message: [E] [TRT] 7: [shapeMachine.cpp::executeContinuation::864] Error Code 7: Internal Error (/lightglue/ArgMax_1: length of reduction axis ((MIN 2048 (# 0 (VALUE /extractor/NonZero_1[size])))) is smaller than K (1) Condition '==' violated: 0 != 1. Instruction: CHECK_EQUAL 0 1.)
&&&& PASSED TensorRT.trtexec [TensorRT v100000] # trtexec --onnx=superpoint_2048_lightglue_end2end.onnx --optShapes=image0:1x1x3840x640,image1:1x1x3840x128
Looks like fixed in 10.0, but I don't know when will the be the JP release come with TRT 10.
The x86 version should be release soon.
BTW I sew many warning like
[03/10/2024-08:01:36] [W] [TRT] /lightglue/self_attn.8/Reshape_14: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[03/10/2024-08:01:36] [W] [TRT] /lightglue/self_attn.8/Reshape_18: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 0 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[03/10/2024-08:01:36] [W] [TRT] /lightglue/self_attn.8/Reshape_15: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 2 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[03/10/2024-08:01:36] [W] [TRT] /lightglue/self_attn.8/Reshape_19: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 2 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
[03/10/2024-08:01:36] [W] [TRT] /lightglue/log_assignment.8/Reshape_7: IShuffleLayer with zeroIsPlaceHolder=true has reshape dimension at position 2 that might or might not be zero. TensorRT resolves it at runtime, but this may cause excessive memory consumption and is usually a sign of a bug in the network.
I build the engines with the static sharp you mention both on Windows and Jetson Orin with Jackpack DP 6.0 with TensorRT 8.6. Both building the engines are OK. The Logs for building are attached. In Orin still have the same error message while Windows does not. TensorRT-8.6.1.6.windows.txt TensorRT-8.6.2.3.jetson_orin.dp6.txt
ONNX Reshape has an attribute “allowzero” for turning off “zero as placeholder”, then check the warning
@chenscottus have you tried building on Jetson Nano? If you have any tips how to do?