TensorRT
TensorRT copied to clipboard
trtexec can't compile ONNX model with `!n->candidateRequirements.empty() failed. No supported formats for Unsqueeze`
Description
As reported in https://github.com/huggingface/optimum/issues/1735, a valid ONNX model fails with the latest TRT release:
[02/29/2024-10:22:33] [V] [TRT] After concat removal: 18 layers
[02/29/2024-10:22:33] [V] [TRT] Trying to split Reshape and strided tensor
[02/29/2024-10:22:33] [I] [TRT] Graph optimization time: 1.62121 seconds.
[02/29/2024-10:22:33] [V] [TRT] Building graph using backend strategy 2
[02/29/2024-10:22:33] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[02/29/2024-10:22:33] [V] [TRT] Constructing optimization profile number 0 [1/1].
[02/29/2024-10:22:33] [V] [TRT] Applying generic optimizations to the graph for inference.
[02/29/2024-10:22:33] [E] Error[2]: Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1
[02/29/2024-10:22:33] [E] Error[2]: [optimizer.cpp::getFormatRequirements::3154] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1)
[02/29/2024-10:22:33] [E] Engine could not be created from network
[02/29/2024-10:22:33] [E] Building engine failed
[02/29/2024-10:22:33] [E] Failed to create engine from model or file.
[02/29/2024-10:22:33] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8
I tried both with int32 & int64 input dtype and it does not seem to matter.
Environment
TensorRT Version: nvcr.io/nvidia/tensorrt:24.01-py3
NVIDIA GPU: A100-80GB
NVIDIA Driver Version: CUDA_DRIVER_VERSION=545.23.08
CUDA Version: CUDA_VERSION=12.3.2.001
CUDNN Version: CUDNN_VERSION=8.9.7.29+cuda12.2
Relevant Files
It is 125 MB larger than 25 MB, so uploading here: https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt
Please use git clone https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt
Steps To Reproduce
Download the above model and run:
Commands or scripts: trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it works.
Would appreciate your help on this @zerollzeng.
Thanks, I can reproduce the issue and file internal bug 4544519 to track this.
Thanks @zerollzeng. This is high-priority for us. Any ETA on a fix?
We are working on this, will come back to you one we have progress. Thanks!
This will be fixed in TRT 10.0 EA which should be released soon. Closed this bug.