TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

trtexec can't compile ONNX model with `!n->candidateRequirements.empty() failed. No supported formats for Unsqueeze`

Open fxmarty opened this issue 1 year ago • 2 comments

Description

As reported in https://github.com/huggingface/optimum/issues/1735, a valid ONNX model fails with the latest TRT release:

[02/29/2024-10:22:33] [V] [TRT] After concat removal: 18 layers
[02/29/2024-10:22:33] [V] [TRT] Trying to split Reshape and strided tensor
[02/29/2024-10:22:33] [I] [TRT] Graph optimization time: 1.62121 seconds.
[02/29/2024-10:22:33] [V] [TRT] Building graph using backend strategy 2
[02/29/2024-10:22:33] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[02/29/2024-10:22:33] [V] [TRT] Constructing optimization profile number 0 [1/1].
[02/29/2024-10:22:33] [V] [TRT] Applying generic optimizations to the graph for inference.
[02/29/2024-10:22:33] [E] Error[2]: Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1
[02/29/2024-10:22:33] [E] Error[2]: [optimizer.cpp::getFormatRequirements::3154] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1)
[02/29/2024-10:22:33] [E] Engine could not be created from network
[02/29/2024-10:22:33] [E] Building engine failed
[02/29/2024-10:22:33] [E] Failed to create engine from model or file.
[02/29/2024-10:22:33] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8

I tried both with int32 & int64 input dtype and it does not seem to matter.

Environment

TensorRT Version: nvcr.io/nvidia/tensorrt:24.01-py3

NVIDIA GPU: A100-80GB

NVIDIA Driver Version: CUDA_DRIVER_VERSION=545.23.08

CUDA Version: CUDA_VERSION=12.3.2.001

CUDNN Version: CUDNN_VERSION=8.9.7.29+cuda12.2

Relevant Files

It is 125 MB larger than 25 MB, so uploading here: https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt

Please use git clone https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt

Steps To Reproduce

Download the above model and run:

Commands or scripts: trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it works.

fxmarty avatar Feb 29 '24 10:02 fxmarty

Would appreciate your help on this @zerollzeng.

michaelroyzen avatar Mar 02 '24 04:03 michaelroyzen

Thanks, I can reproduce the issue and file internal bug 4544519 to track this.

zerollzeng avatar Mar 03 '24 02:03 zerollzeng

Thanks @zerollzeng. This is high-priority for us. Any ETA on a fix?

michaelroyzen avatar Mar 05 '24 01:03 michaelroyzen

We are working on this, will come back to you one we have progress. Thanks!

zerollzeng avatar Mar 06 '24 01:03 zerollzeng

This will be fixed in TRT 10.0 EA which should be released soon. Closed this bug.

zerollzeng avatar Mar 11 '24 08:03 zerollzeng