rf-detr
rf-detr copied to clipboard
Severe accuracy drop when using optimize_for_inference(dtype=torch.float16)
Search before asking
- [x] I have searched the RF-DETR issues and found no similar bug report.
Bug
Hi, thanks for your great work on RFDETR!
I encountered a severe accuracy drop when running inference with FP16.
Here are the details:
-
Tested models:
- Official
RFDETRNano - Official
RFDETRSmall - My custom-trained
RFDETRNano(trained on my own dataset)
- Official
-
Code example:
from rfdetr import RFDETRNano
import torch
model = RFDETRNano()
model.optimize_for_inference(dtype=torch.float16)
# Run inference...
Once I apply optimize_for_inference(dtype=torch.float16), the inference accuracy drops significantly compared to FP32.
This happens across both official pretrained models and my own trained model.
Could you confirm if FP16 inference is supported, or if additional steps are needed to maintain accuracy?
Thanks!
Environment
- RF-DETR 1.2.0
- OS: both
ubuntu 22.04andWindows 11 - Python 3.10
- torch 2.7.0
- CUDA 12.4
- GPU both
RTX 3060andRTX A5000
Minimal Reproducible Example
image = Image.open('demo.jpg')
model = RFDETRNano()
model.optimize_for_inference(dtype=torch.float16)
detections = model.predict(image, conf=0.2)
Additional
I also tried exporting the model to TorchScript and compared the accuracy between
- using
.half()before export - and without
.half()
In both cases, I observed the same issue: inference accuracy drops significantly when running in FP16.
Are you willing to submit a PR?
- [ ] Yes, I'd like to help by submitting a PR!
Same here, I used a custom trained RFDETRMedium.
We're aware of the issue, same thing that causes degradation in TRT which is resolved by this approach https://github.com/roboflow/rf-detr/issues/176 but I guess that isn't relevant for torchscript
Models trained on and exported from roboflow platform SHOULD work out of the box in fp16 with no decay as we have a different implementation there. Once we have bandwidth we'll release more of the on-platform implementation which should resolve the issue. Very very small team behind this project, apologies for the delay
btw how does it work if you use bfloat16?
Using bfloat16 fixes this issue and provides the same improvement.
model.optimize_for_inference(dtype=torch.bfloat16)
new model definition should be much more stable in half precision