Zero Zeng
Zero Zeng
The diff between FP32 TRT and FP32 ONNXRT is very close.
I've filed internal bug 3813586 to track this.
This is not a bug. Model has LayeNorm subgraph in it, when running it in fp16 the results differ between ORT and TRT as ORT. This happens because this subgraph's...
@nvpohanh ^ ^
Tried to reproduce the issue with TRT 8.4.1.5 using polygraphy: ``` [I] onnxrt-runner-N0-09/07/22-08:16:25 | Completed 1 iteration(s) in 0.1693 ms | Average inference time: 0.1693 ms. [I] Accuracy Comparison |...
I can reproduce this with ``` [I] trt-runner-N0-09/09/22-00:16:40: output | Stats: mean=0.35972, std-dev=0.34652, var=0.12008, median=0.27958, min=0 at (1, 0, 0), max=0.96826 at (0, 0, 1), avg-magnitude=0.35972 [I] ---- Histogram ----...
The issue has been fixed in TRT 8.5, there will be a preview feature to fix this issue, please wait for the 8.5 release coming soon :-)
@rajeevsrao @kevinch-nv Can you help check it ^ ^
https://huggingface.co/docs/transformers/model_doc/bert or our [demoBert](https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT)?
I can reproduce this in TRT 8.5.0.9. but the issue is gone when I don't use dynamic shape. ``` [I] trt-runner-N0-09/22/22-00:17:14 | Completed 1 iteration(s) in 17.49 ms | Average...