No speed improvement between FP16 and INT8 TensorRT models
Search before asking
- [X] I have searched the YOLOv5 issues and found no similar bug report.
YOLOv5 Component
Validation
Bug
When validating my YOLOv5n both in FP16 and INT8 precision I see no performance improvement for the INT8 version, while accuracy and model size drop (which is ok!). I then checked with trtexec and I again get the same latency: yolov5n.txt.
Since this does not happens for latest YOLOs (where I see around 20% latency improvement), I was thinking that YOLOv5 does not have operations that benefit from INT8 on my current architecture (i.e. 16-bit is already fully optimized). Can you help me understanding if this is true or I am making any mistake?
Environment
- YOLO: YOLOv5n v7.0 fine-tuned on custom dataset
- TensorRT: 8.6.2.3
- Device: NVIDIA Jetson Orin Nano 8GB
Minimal Reproducible Example
python val.py --weights yolo5n.engine --data data.yaml --batch 16 --task test
python val.py --weights yolo5n-int8.engine --data data.yaml --batch 16 --task test
trtexec --loadEngine=yolo5n.engine --batch=1 --fp16
trtexec --loadEngine=yolo5n-int8.engine --batch=1 --best
Additional
Model files: models.zip
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
π Hello @ingtommi, thank you for your interest in YOLOv5 π!
It looks like you're encountering an issue with performance differences between FP16 and INT8 TensorRT models. Since this appears to be a π Bug Report, we would appreciate it if you could provide a minimum reproducible example (MRE) to assist us in debugging this issue. This could include specific commands you used, a small sample of your dataset, or any additional logs that might help clarify the problem.
Please also double-check your environment to ensure compatibility:
- Python version is 3.8 or higher.
- YOLOv5 dependencies are properly installed using the
requirements.txtfile. - TensorRT and GPU drivers are updated and configured correctly for inference.
For debugging, it might be helpful to test using different hardware or TensorRT versions to see if the issue persists. If this is related to specific YOLOv5 configurations, please share more details about your setup or the customizations you have made.
An Ultralytics engineer will review this shortly and provide further assistanceβthank you for your patience! π
Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.
If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.
Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.
YOLOv5 doesn't support INT8 TensorRT exports.
@Y-T-G I do not use this repo for TensorRT export, I rather convert it to onnx and then use a custom script to build the engines (and calibarte+quantize for INT8). The script is based on TensorRT Python APIs and you can check it here.
Does the benchmark with trtexec show a difference?
@Y-T-G no, you can check it yourself in the txt file I attached above.
It's probably not a bug then
@Y-T-G Yes, but I found no similar thing on the internet (no one comparing YOLOv5 fps fp16-int8) so I had to ask...
Someone mentioned there was 10% improvement
https://forums.developer.nvidia.com/t/the-inference-speed-of-yolov5-tensorrt-has-little-difference-between-int8-and-fp16/227183
@Y-T-G yeah sorry, I also found that one (seems to be the only). 10% is better than my 0%, but he also sees little difference in memory while I move from 6.3 MB (fp16) to 4.7 MB (int8).
Thank you for your detailed report and testing effort! Your observation about minimal or no speed improvement with INT8 on YOLOv5 compared to FP16 is valid and may be attributed to hardware and architectural factors. Some architectures, particularly on devices like the Jetson Orin Nano, show limited benefits from INT8 due to high FP16 optimization. YOLOv5's operations might not fully utilize INT8 optimizations compared to newer YOLO versions with refined quantization-aware designs.
If verifying on a different architecture still shows discrepancies, it might indicate that INT8 calibration settings could be suboptimal or the TensorRT INT8 kernel isn't fully leveraged for YOLOv5. For further exploration, ensure calibration data is diverse and representative of deployment inputs. Additionally, testing with dynamic batch sizes or alternate precision configurations (e.g., mixing INT8/FP16) could be insightful.
Let us know if you see different outcomes or need additional guidance! For reference, you can explore this TensorRT guide for further optimization techniques.
Can the method this TensorRT guide be used to convert the tensorrt format of yolov5? I see that the document mainly mentions yolov11.
Thanks for checking! While the TensorRT guide highlights YOLO11 examples, the same workflow applies to YOLOv5 models. Ensure you're using the latest ultralytics package and follow the export steps with your YOLOv5 model. For YOLOv5-specific TensorRT conversion, see our general export tutorial at Export YOLOv5 Model to TensorRT Format. Let us know if you hit any snags! π
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β