Discrepancy between TRT engines for same models - TensorRT Issue cross-reference
We are trying to recreate the results for EfficientViT-SAM, but are having issues with certain models. In this case, we reported issues with l2_encoder.onnx. In summary, these are the results:
L2 - FP16 {"all": 0.0, "large": 0.0, "medium": 0.0, "small": 0.0}
L2 - FP32 {"all": 79.12385607181146, "large": 83.05853600575689, "medium": 81.50597370444349, "small": 74.8830670481846}
All details about the setup is available through the link: Accuracy failure of TensorRT 8.6.3 when running trtexec built engine on GPU RTX4090
I was able to resolve a similar issue by setting some layers in the attention block to FP32 precision. Might help with this case as well. I was able to retain 2x speedup compared to a full FP32 model. See: https://github.com/mit-han-lab/efficientvit/issues/116#issuecomment-2138394567