rf-detr icon indicating copy to clipboard operation
rf-detr copied to clipboard

Finetuned RFDETR-small model outputs detection with high IOU

Open agvdndor opened this issue 3 weeks ago • 4 comments
trafficstars

Hi! We have finetuned a RFDETR-small model for object detection of a single custom class. In most cases it works very well, but there's an environment in which the model frequently outputs two detections for a single ground truth object in the image that have a very high IOU (>0.9). Our use case involves object counting so this is a breaking issue.

Specifically, We have trained a RFDETR-small model (with rfdetr=1.3.0) exported the model to onnx (opset 16) and are doing inference via a Triton inference server (v2.49) with onnxruntime=1.18.

The obvious answer is to apply NMS postprocessing to resolve this, but since not requiring NMS (and the inference speed gain is brings) is one of the selling points of using RFDETR over YOLO I was wondering to what degree this is expected behavior or not! The expected number of detections per image for our use case is limited (<20) so the the NMS overhead would be minimal as well.

agvdndor avatar Nov 02 '25 10:11 agvdndor

That can happen if it doesn't finish training. How long are you training it for?

Also, I highly recommend using opset 17 or later so you get explicit representation of the layernorm. This is important for fp16 but may just be more stable in general.

isaacrob-roboflow avatar Nov 02 '25 18:11 isaacrob-roboflow

Aha interesting. We indeed only trained it for 20 epochs because the dataset is not that challenging and we're already having a near-perfect score after 20 epochs. I'll try training for longer and see if the behavior still occurs.

Can I assume that once the loss curves converge we are done training and this should no longer happen?

(Opset 16 was used because it has to run on an older device as well where we currently cannot upgrade the triton server to a more recent version)

agvdndor avatar Nov 02 '25 19:11 agvdndor

Loss will be minimized when this behavior doesn't occur, so ideally once it converges it is in that minimal state such that it won't. But hard to say a priori of course

isaacrob-roboflow avatar Nov 02 '25 20:11 isaacrob-roboflow

If the issue continues, I'd recommend trying the medium model at the small resolution. Impact to runtime will be slight, and it may have an easier time with this case for some interesting theoretical reasons. If you try that experiment please let me know the result!!

isaacrob-roboflow avatar Nov 02 '25 20:11 isaacrob-roboflow

We have trained the small model for longer now (85 epochs until loss converged and early stopping kicked in). We have also bumped to opset 17 where we possible. Over the past two week we have definitely seen a decrease in the occurrence of this behavior, but it sporadically still happens. Just to be safe we will be implementing NMS as an additional postprocessing step.

We have yet to try the medium model at 512x512 resolution. I'm curious what the impact is of the training the models at their non-native, documented resolution in terms of accuracy and performance? Would you expect finetuning the medium model at resolutions different from 576x576 to be on par with finetuning with a preprocessing step that resizes to the native resolution?

agvdndor avatar Nov 14 '25 10:11 agvdndor

These models were all discovered with NAS and finetuned (see the newly released paper for details) so other resolutions were very much within domain until the final stage. I don't have specific numbers but it's probably safe, just not Pareto optimal on COCO. But we show that running NAS directly on other datasets is better than transferring optimal COCO architectures so this may be one of those cases.

Are the boxes exactly the same but with different labels? Or do they just have very high overlap but not exactly the same

isaacrob-roboflow avatar Nov 14 '25 16:11 isaacrob-roboflow