TensorRT
TensorRT copied to clipboard
Converting engine file from onnx file with ReduceMax failure of TensorRT 8.5.10 when running trtexec on GPU Orin
Description
I tried to generate engine file from onnx file on Orin GPU, but it failed: [05/15/2024-11:45:16] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB) [05/15/2024-11:45:16] [E] Saving engine to file failed. [05/15/2024-11:45:16] [E] Engine set up failed
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Please add --verbose to get more detailed log.
Please add
--verboseto get more detailed log.
Hi, I replaced the original nn.Layernorm block by nn.BatchNormalization block. Now my new network onnx file is :
According to the docucment: "https://github.com/NVIDIA/Deep-Learning-Accelerator-SW/tree/main/operators", BatchNormalizaion operator is supported native by Nvidia DLA, but when I try to generate engine file from the onnx file, I still failed. The end part of log is here:
[05/15/2024-20:44:50] [V] [TRT] Layer: MaxPool_5 Host Persistent: 1408 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_12 Host Persistent: 6752 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_13 || Gemm_14 Host Persistent: 5664 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_15 Host Persistent: 6752 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17) Host Persistent: 244 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_19 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_20 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Layer: Gemm_21 Host Persistent: 6048 Device Persistent: 0 Scratch Memory: 0
[05/15/2024-20:44:50] [V] [TRT] Skipped printing memory information for 22 layers with 0 memory size i.e. Host Persistent + Device Persistent + Scratch Memory == 0.
[05/15/2024-20:44:50] [I] [TRT] Total Host Persistent Memory: 45280
[05/15/2024-20:44:50] [I] [TRT] Total Device Persistent Memory: 0
[05/15/2024-20:44:50] [I] [TRT] Total Scratch Memory: 0
[05/15/2024-20:44:50] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 132 MiB
[05/15/2024-20:44:50] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 29 steps to complete.
[05/15/2024-20:44:50] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.337024ms to assign 7 blocks to 29 nodes requiring 126464 bytes.
[05/15/2024-20:44:50] [V] [TRT] Total number of blocks in optimized block assignment: 7
[05/15/2024-20:44:50] [I] [TRT] Total Activation Memory: 126464
[05/15/2024-20:44:50] [V] [TRT] Finalize: MatMul_0 Set kernel index: 0
[05/15/2024-20:44:50] [V] [TRT] Finalize: MaxPool_5 Set kernel index: 1
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_12 Set kernel index: 2
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_13 || Gemm_14 Set kernel index: 3
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_15 Set kernel index: 2
[05/15/2024-20:44:50] [V] [TRT] Finalize: PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17) Set kernel index: 4
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_19 Set kernel index: 5
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_20 Set kernel index: 6
[05/15/2024-20:44:50] [V] [TRT] Finalize: Gemm_21 Set kernel index: 6
[05/15/2024-20:44:50] [V] [TRT] Total number of generated kernels selected for the engine: 7
[05/15/2024-20:44:50] [V] [TRT] Kernel: 0 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 1 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 2 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 3 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 4 TRT_SERIALIZABLE:generatedNativePointwise
[05/15/2024-20:44:50] [V] [TRT] Kernel: 5 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Kernel: 6 CASK_STATIC
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: CUDNN
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: CUBLAS, CUBLAS_LT
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[05/15/2024-20:44:50] [V] [TRT] Disabling unused tactic source: JIT_CONVOLUTIONS
[05/15/2024-20:44:50] [V] [TRT] Engine generation completed in 10.7422 seconds.
[05/15/2024-20:44:50] [V] [TRT] Deleting timing cache: 141 entries, served 42 hits since creation.
[05/15/2024-20:44:50] [V] [TRT] Engine Layer Information:
Layer(NoOp): reshape_before_MatMul_0, Tactic: 0x0000000000000000, x (Float[12,20,12]) -> reshape_before_MatMul_0_out_tensor (Float[240,12,1,1])
Layer(NoOp): Reformatting CopyNode for Input Tensor 0 to MatMul_0, Tactic: 0x0000000000000000, reshape_before_MatMul_0_out_tensor (Float[240,12,1,1]) -> Reformatted Input Tensor 0 to MatMul_0 (Float[240,12:4,1,1])
Layer(CaskGemmConvolution): MatMul_0, Tactic: 0x00000000000201d1, Reformatted Input Tensor 0 to MatMul_0 (Float[240,12:4,1,1]) -> MatMul_0_out_tensor (Float[240,64:4,1,1])
Layer(NoOp): Reformatting CopyNode for Input Tensor 0 to reshape_after_MatMul_0, Tactic: 0x0000000000000000, MatMul_0_out_tensor (Float[240,64:4,1,1]) -> Reformatted Input Tensor 0 to reshape_after_MatMul_0 (Float[240,64,1,1])
Layer(NoOp): reshape_after_MatMul_0, Tactic: 0x0000000000000000, Reformatted Input Tensor 0 to reshape_after_MatMul_0 (Float[240,64,1,1]) -> onnx::Add_25 (Float[12,20,64])
Layer(Constant): backbone.subgraph.linear.bias + (Unnamed Layer* 4) [Shuffle], Tactic: 0x0000000000000000, -> (Unnamed Layer* 4) [Shuffle]_output (Float[1,1,64])
Layer(ElementWise): Add_1, Tactic: 0x0000000000000001, (Unnamed Layer* 4) [Shuffle]_output (Float[1,1,64]), onnx::Add_25 (Float[12,20,64]) -> input (Float[12,20,64])
Layer(NoOp): (Unnamed Layer* 6) [Shuffle], Tactic: 0x0000000000000000, input (Float[12,20,64]) -> (Unnamed Layer* 6) [Shuffle]_output (Float[12,20,64,1])
Layer(Scale): BatchNormalization_2 + Relu_3, Tactic: 0x0000000000000000, (Unnamed Layer* 6) [Shuffle]_output (Float[12,20,64,1]) -> Relu_3_out_tensor (Float[12,20,64,1])
Layer(NoOp): squeeze_after_Relu_3, Tactic: 0x0000000000000000, Relu_3_out_tensor (Float[12,20,64,1]) -> squeeze_after_Relu_3_out_tensor (Float[12,20,64])
Layer(Shuffle): Transpose_4 + (Unnamed Layer* 11) [Shuffle], Tactic: 0x0000000000000000, squeeze_after_Relu_3_out_tensor (Float[12,20,64]) -> (Unnamed Layer* 11) [Shuffle]_output (Float[12,64,20,1])
Layer(CaskPooling): MaxPool_5, Tactic: 0x5faf4a0a8a5670ed, (Unnamed Layer* 11) [Shuffle]_output (Float[12,64,20,1]) -> (Unnamed Layer* 12) [Pooling]_output (Float[12,64,1,1])
Layer(NoOp): (Unnamed Layer* 13) [Shuffle] + Squeeze_6, Tactic: 0x0000000000000000, (Unnamed Layer* 12) [Pooling]_output (Float[12,64,1,1]) -> x.1 (Float[12,64])
Layer(Reformat): reshape_before_Gemm_12_copy_input, Tactic: 0x00000000000003e8, x.1 (Float[1,64]) -> reshape_before_Gemm_12_copy_input (Float[1,64])
Layer(NoOp): reshape_before_Gemm_12, Tactic: 0x0000000000000000, reshape_before_Gemm_12_copy_input (Float[1,64]) -> reshape_before_Gemm_12_out_tensor (Float[1,64,1,1])
Layer(CaskGemmConvolution): Gemm_12, Tactic: 0x000000000002034f, reshape_before_Gemm_12_out_tensor (Float[1,64,1,1]) -> Gemm_12_out_tensor (Float[1,32,1,1])
Layer(NoOp): reshape_after_Gemm_12, Tactic: 0x0000000000000000, Gemm_12_out_tensor (Float[1,32,1,1]) -> onnx::Gemm_37 (Float[1,32])
Layer(NoOp): reshape_before_Gemm_13, Tactic: 0x0000000000000000, x.1 (Float[12,64]) -> reshape_before_Gemm_13_out_tensor (Float[12,64,1,1])
Layer(CaskGemmConvolution): Gemm_13 || Gemm_14, Tactic: 0x00000000000204df, reshape_before_Gemm_13_out_tensor (Float[12,64,1,1]) -> Gemm_13 || Gemm_14 (Float[12,64,1,1])
Layer(Reformat): reshape_after_Gemm_13_copy_input, Tactic: 0x00000000000003e8, Gemm_13 || Gemm_14 (Float[12,32,1,1]) -> reshape_after_Gemm_13_copy_input (Float[12,32,1,1])
Layer(NoOp): reshape_after_Gemm_13, Tactic: 0x0000000000000000, reshape_after_Gemm_13_copy_input (Float[12,32,1,1]) -> onnx::Gemm_38 (Float[12,32])
Layer(Reformat): reshape_after_Gemm_14_copy_input, Tactic: 0x00000000000003e8, Gemm_13 || Gemm_14 (Float[12,32,1,1]) -> reshape_after_Gemm_14_copy_input (Float[12,32,1,1])
Layer(NoOp): reshape_after_Gemm_14, Tactic: 0x0000000000000000, reshape_after_Gemm_14_copy_input (Float[12,32,1,1]) -> onnx::Gemm_39 (Float[12,32])
Layer(CaskGemmMatrixMultiply): Gemm_15, Tactic: 0x000000000002034f, onnx::Gemm_37 (Float[1,32]), onnx::Gemm_38 (Float[12,32]) -> onnx::Div_40 (Float[1,12])
Layer(PointWiseV2): PWN(onnx::Div_41 + (Unnamed Layer* 33) [Shuffle], Div_17), Tactic: 0x000000000000001c, onnx::Div_40 (Float[1,12]) -> scores (Float[1,12])
Layer(CudaSoftMax): Softmax_18, Tactic: 0x00000000000003e9, scores (Float[1,12]) -> (Unnamed Layer* 36) [Softmax]_output (Float[1,12])
Layer(CaskGemmMatrixMultiply): Gemm_19, Tactic: 0x00000000000203be, (Unnamed Layer* 36) [Softmax]_output (Float[1,12]), onnx::Gemm_39 (Float[12,32]) -> onnx::Gemm_44 (Float[1,32])
Layer(NoOp): reshape_before_Gemm_20, Tactic: 0x0000000000000000, onnx::Gemm_44 (Float[1,32]) -> reshape_before_Gemm_20_out_tensor (Float[1,32,1,1])
Layer(CaskGemmConvolution): Gemm_20, Tactic: 0x000000000002014b, reshape_before_Gemm_20_out_tensor (Float[1,32,1,1]) -> Gemm_20_out_tensor (Float[1,32,1,1])
Layer(CaskGemmConvolution): Gemm_21, Tactic: 0x000000000002014b, Gemm_20_out_tensor (Float[1,32,1,1]) -> Gemm_21_out_tensor (Float[1,30,1,1])
Layer(NoOp): reshape_after_Gemm_21, Tactic: 0x0000000000000000, Gemm_21_out_tensor (Float[1,30,1,1]) -> reg (Float[1,30])
[05/15/2024-20:44:50] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +4, now: CPU 0, GPU 4 (MiB)
[05/15/2024-20:44:50] [E] Saving engine to file failed.
[05/15/2024-20:44:50] [E] Engine set up failed
Please check and have a nice day
And if I remove the LayerNorm or BatchNormlization block, can success to generate the engine file.
You can try to convert these two modules(LayerNorm or BatchNormlization block as a subgraph onnx) separately.
[05/15/2024-20:44:50] [E] Saving engine to file failed.
no disk space?
[05/15/2024-20:44:50] [E] Saving engine to file failed.
no disk space?
Hi, thanks for your reply. I tried again with new .pt, and success to create the engine file. And there is one more thing I would like to make clear, is that up till now, we cannot use LayerNormalization operator on Orin DRIVE unless write a TensorRT Plugin by myself?
Please check our release note, I think you need at least TRT 8.6 or 9.0, can't remember exactly which one.
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!