Discrepancy between TRT engines for same models - TensorRT Issue cross-reference

Open bernardrb opened this issue 1 year ago • 1 comments

We are trying to recreate the results for EfficientViT-SAM, but are having issues with certain models. In this case, we reported issues with l2_encoder.onnx. In summary, these are the results:

L2 - FP16 {"all": 0.0, "large": 0.0, "medium": 0.0, "small": 0.0}

L2 - FP32 {"all": 79.12385607181146, "large": 83.05853600575689, "medium": 81.50597370444349, "small": 74.8830670481846}

All details about the setup is available through the link: Accuracy failure of TensorRT 8.6.3 when running trtexec built engine on GPU RTX4090

May 24 '24 08:05 bernardrb

I was able to resolve a similar issue by setting some layers in the attention block to FP32 precision. Might help with this case as well. I was able to retain 2x speedup compared to a full FP32 model. See: https://github.com/mit-han-lab/efficientvit/issues/116#issuecomment-2138394567

May 29 '24 23:05 ovunctuzel-bc