layernorm fp16, The accuracy is still insufficient

Open CallmeZhangChenchen opened this issue 4 months ago • 2 comments

GPU: 4090

[06/18/2025-03:48:34] [I] TensorRT version: 10.10.0 [06/18/2025-03:48:34] [I] Loading standard plugins [06/18/2025-03:48:34] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 390 (MiB) [06/18/2025-03:48:37] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1749, GPU +8, now: CPU 1977, GPU 398 (MiB) [06/18/2025-03:48:37] [I] Start parsing network model. [06/18/2025-03:48:37] [I] [TRT] ---------------------------------------------------------------- [06/18/2025-03:48:37] [I] [TRT] Input filename: model.onnx [06/18/2025-03:48:37] [I] [TRT] ONNX IR version: 0.0.8 [06/18/2025-03:48:37] [I] [TRT] Opset version: 17 [06/18/2025-03:48:37] [I] [TRT] Producer name: pytorch [06/18/2025-03:48:37] [I] [TRT] Producer version: 2.7.1 [06/18/2025-03:48:37] [I] [TRT] Domain:
[06/18/2025-03:48:37] [I] [TRT] Model version: 0 [06/18/2025-03:48:37] [I] [TRT] Doc string:
[06/18/2025-03:48:37] [I] [TRT] ---------------------------------------------------------------- [06/18/2025-03:48:37] [W] [TRT] ModelImporter.cpp:503: Make sure input input_ids has Int64 binding. [06/18/2025-03:48:37] [W] [TRT] ModelImporter.cpp:503: Make sure input attention_mask has Int64 binding. [06/18/2025-03:48:39] [I] Finished parsing network model. Parse time: 2.17323 [06/18/2025-03:48:39] [I] Set shape of input tensor input_ids for optimization profile 0 to: MIN=1x1 OPT=8x100 MAX=128x1024 [06/18/2025-03:48:39] [I] Set shape of input tensor attention_mask for optimization profile 0 to: MIN=1x1 OPT=8x100 MAX=128x1024 [06/18/2025-03:48:42] [W] [TRT] Detected layernorm nodes in FP16. [06/18/2025-03:48:42] [W] [TRT] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accur acy.

Reproduction steps

/usr/local/lib/python3.**/dist-packages/optimum/exporters/onnx/convert.py(467) def export_pytorch() Set opset=17

from sentence_transformers import (
    SentenceTransformer,
)
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", backend="onnx")
model.save_pretrained("export")

/usr/src/tensorrt/bin/trtexec --minShapes=input_ids:1x1,attention_mask:1x1 --optShapes=input_ids:8x100,attention_mask:8x100 --maxShapes=input_ids:128x1024,attention_mask:128x1024 --onnx=model.onnx --saveEngine=./model.plan --fp16

Jun 18 '25 05:06 CallmeZhangChenchen

TensorRT TensorRT copied to clipboard

layernorm fp16, The accuracy is still insufficient

GPU: 4090

Reproduction steps

TensorRT
TensorRT copied to clipboard