yolov5 No speed improvement between FP16 and INT8 TensorRT models

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation

Bug

yolov5-issue

When validating my YOLOv5n both in FP16 and INT8 precision I see no performance improvement for the INT8 version, while accuracy and model size drop (which is ok!). I then checked with trtexec and I again get the same latency: yolov5n.txt.

Since this does not happens for latest YOLOs (where I see around 20% latency improvement), I was thinking that YOLOv5 does not have operations that benefit from INT8 on my current architecture (i.e. 16-bit is already fully optimized). Can you help me understanding if this is true or I am making any mistake?

Environment

YOLO: YOLOv5n v7.0 fine-tuned on custom dataset
TensorRT: 8.6.2.3
Device: NVIDIA Jetson Orin Nano 8GB

Minimal Reproducible Example

python val.py --weights yolo5n.engine --data data.yaml --batch 16 --task test
python val.py --weights yolo5n-int8.engine --data data.yaml --batch 16 --task test

trtexec --loadEngine=yolo5n.engine --batch=1 --fp16
trtexec --loadEngine=yolo5n-int8.engine --batch=1 --best

Additional

Model files: models.zip

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

Nov 27 '24 08:11 ingtommi

👋 Hello @ingtommi, thank you for your interest in YOLOv5 🚀!

It looks like you're encountering an issue with performance differences between FP16 and INT8 TensorRT models. Since this appears to be a 🐛 Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to assist us in debugging this issue. This could include specific commands you used, a small sample of your dataset, or any additional logs that might help clarify the problem.

Please also double-check your environment to ensure compatibility:

Python version is 3.8 or higher.
YOLOv5 dependencies are properly installed using the requirements.txt file.
TensorRT and GPU drivers are updated and configured correctly for inference.

For debugging, it might be helpful to test using different hardware or TensorRT versions to see if the issue persists. If this is related to specific YOLOv5 configurations, please share more details about your setup or the customizations you have made.

An Ultralytics engineer will review this shortly and provide further assistance—thank you for your patience! 😊

Nov 27 '24 08:11 UltralyticsAssistant

Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.

If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.

Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.

Nov 27 '24 22:11 pderrenger

YOLOv5 doesn't support INT8 TensorRT exports.

Nov 28 '24 08:11 Y-T-G

@Y-T-G I do not use this repo for TensorRT export, I rather convert it to onnx and then use a custom script to build the engines (and calibarte+quantize for INT8). The script is based on TensorRT Python APIs and you can check it here.

Nov 28 '24 08:11 ingtommi

Does the benchmark with trtexec show a difference?

Nov 28 '24 08:11 Y-T-G

@Y-T-G no, you can check it yourself in the txt file I attached above.

Nov 28 '24 08:11 ingtommi

It's probably not a bug then

Nov 28 '24 09:11 Y-T-G

@Y-T-G Yes, but I found no similar thing on the internet (no one comparing YOLOv5 fps fp16-int8) so I had to ask...

Nov 28 '24 09:11 ingtommi

Someone mentioned there was 10% improvement

https://forums.developer.nvidia.com/t/the-inference-speed-of-yolov5-tensorrt-has-little-difference-between-int8-and-fp16/227183

Nov 28 '24 09:11 Y-T-G

@Y-T-G yeah sorry, I also found that one (seems to be the only). 10% is better than my 0%, but he also sees little difference in memory while I move from 6.3 MB (fp16) to 4.7 MB (int8).

Nov 28 '24 09:11 ingtommi

Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.

If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.

Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.

Can the method this TensorRT guide be used to convert the tensorrt format of yolov5? I see that the document mainly mentions yolov11.

Mar 07 '25 13:03 backkon

Thanks for checking! While the TensorRT guide highlights YOLO11 examples, the same workflow applies to YOLOv5 models. Ensure you're using the latest ultralytics package and follow the export steps with your YOLOv5 model. For YOLOv5-specific TensorRT conversion, see our general export tutorial at Export YOLOv5 Model to TensorRT Format. Let us know if you hit any snags! 🚀

Mar 08 '25 06:03 pderrenger

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Nov 23 '25 00:11 github-actions[bot]