TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Set layernorm to fp32 failure of TensorRT v8.6.11 when running trtexec on GPU A100

Open jibf opened this issue 1 year ago • 2 comments

Description

I tried to convert an onnx to trt on A100 with layernorm set fp32 specificly. But the whole transformer block was wrapped into a Myelin layer in which final precision of layernorm was still fp16.

image

detailed log: cvt.log

Environment

TensorRT Version: TensorRT v8611

NVIDIA GPU: NVIDIA A100-SXM4-40GB

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts: trtexec --onnx=model/nvln.onnx --fp16 --noTF32 --saveEngine=model/nvln.exec.fp16.trt --layerPrecisions=LayerNormalization_*:fp32,Softmax_*:fp32,Conv_0:fp32 --layerOutputTypes=LayerNormalization_*:fp32,Softmax_*:fp32,Conv_0:fp32 --precisionConstraints=obey --timingCacheFile=x86.tc --exportLayerInfo=nvln.fp16.json --exportProfile=nvln.fp16.profile.json --profilingVerbosity=detailed --dumpProfile --verbose

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

jibf avatar May 22 '24 12:05 jibf

Using our layernorm fp32 plugin, the precision of model is normal. Using fp16 myelin, the precision drops more than 20%+.

Wuqiman avatar May 23 '24 01:05 Wuqiman

--layerPrecisions=LayerNormalization_*:fp32

Could you please try expand the "*", I'm not sure whether we support this kind of wildcard. You should be able to find the FP16 layernorm name from verbose log, which trt should give you a warning that running layernorm under fp16 will affect accuracy.

zerollzeng avatar May 27 '24 01:05 zerollzeng

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

ttyio avatar Jul 02 '24 16:07 ttyio

I'm having the same problem. Have you solved it?

Smarter-version avatar Jul 25 '24 01:07 Smarter-version

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec
--onnx=$onnx_path
--saveEngine=$engine_path
--plugins=$plugins_path
--verbose --workspace=2048
--exportProfile=${engine_path}.profile.json
--exportLayerInfo=${engine_path}.graph.json
--profilingVerbosity=detailed
--fp16
--precisionConstraints=obey
--layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

Smarter-version avatar Jul 25 '24 03:07 Smarter-version

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec --onnx=$onnx_path --saveEngine=$engine_path --plugins=$plugins_path --verbose --workspace=2048 --exportProfile=${engine_path}.profile.json --exportLayerInfo=${engine_path}.graph.json --profilingVerbosity=detailed --fp16 --precisionConstraints=obey --layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

@Smarter-version Perhaps you can try setting specific layers as output layers. Since TRT's output layer must be fp32, this indirectly achieves the goal of setting these layers to fp32.

jibf avatar Jul 29 '24 08:07 jibf

I want to modify the precision of the Add_3244 node to fp32, but it's wrapped in Myelin. Commands or scripts: trtexec --onnx=$onnx_path --saveEngine=$engine_path --plugins=$plugins_path --verbose --workspace=2048 --exportProfile=${engine_path}.profile.json --exportLayerInfo=${engine_path}.graph.json --profilingVerbosity=detailed --fp16 --precisionConstraints=obey --layerPrecisions=Add_3244:fp32 --layerOutputTypes=Add_3244:fp32 @zerollzeng

@Smarter-version Perhaps you can try setting specific layers as output layers. Since TRT's output layer must be fp32, this indirectly achieves the goal of setting these layers to fp32.

Thank you for your reply!I tried to use the output of some layers as the output of onnx model and then convert it to engine model, but it fails with the error "No value info found for tensor".

Smarter-version avatar Jul 29 '24 08:07 Smarter-version