rf-detr Let's benchmark D-FINE

We have opened an additional issue in hopes of ultimately benchmarking D-FINE with RF100-VL

Let me help you with that. I have my own training pipeline that is simpler then the original D-FINE repo. Uses the same model and losses. I can help you run it.

Also D-FINE author's note on overfitting was about coco+obj35 weights, not coco weights.

We highly recommend that you use the Objects365 pre-trained model for fine-tuning:

⚠️ Important: Please note that this is generally beneficial for complex scene understanding. If your categories are very simple, it might lead to overfitting and suboptimal performance.

And you write it the wrong way:

D-FINE’s fine-tuning capability is currently unavailable, making its domain adaptability performance inaccessible. The authors caution that “if your categories are very simple, it might lead to overfitting and suboptimal performance.”

https://github.com/ArgoHA/custom_d_fine

Apr 14 '25 09:04 ArgoHA

I can't spend my compute on training, but I can get you all the steps you need to train the model. Let me know if you are interested, because at this moment it looks weird that you don't show results from sota model.

Apr 14 '25 10:04 ArgoHA

Hi @ArgoHA 👋🏻 Thanks for your interest in RF-DETR! @probicheaux / @isaacrob-roboflow looks like something we would like to do.

Apr 14 '25 10:04 SkalskiP

We have an on going issue https://github.com/Peterande/D-FINE/issues/214 in the D-FINE repo trying to get training working. It's not about the complexity of the pipeline, it's that we just don't get good results with their pipeline regardless of the starting weights we use. They're currently waiting on me to upload logs for from scratch training.

Not sure how a more convenient wrapper would help?

Apr 14 '25 14:04 isaacrob-roboflow

@ArgoHA I would want to see someone get success with one dataset in RF100-VL and then I'd be happy to throw compute at that formula. Shouldn't take too much compute to test a single dataset, think you could check the dataset we linked in that repo with your code?

Apr 14 '25 14:04 isaacrob-roboflow

It's not a wrapper, it's a compleate rewrite of everything besides model architecture and loss functions.

I can try one dataset, which one you recommend? I see that in RF100-VL you have a bunch of them

Apr 14 '25 15:04 ArgoHA

please use the one we linked in our issue in D-FINE repo and show that you can get different results :) https://github.com/Peterande/D-FINE/issues/214

Apr 14 '25 15:04 isaacrob-roboflow

I don't see the exact dataset you linked. I see your 100 dataset, which contains of several datasets. Can you give me 1 link? roboflow.download_dataset(roboflow_url, "coco"). I don't see the roboflow_url

Apr 14 '25 16:04 ArgoHA

bottom of this message :) https://github.com/Peterande/D-FINE/issues/214#issuecomment-2759574534 but here is the direct link .. a small dataset from a visit to an aquarium

Apr 14 '25 17:04 isaacrob-roboflow

ok, I definitely get model training without tweaks. Here is it in progress:

Apr 14 '25 18:04 ArgoHA

btw, I checked like 2 images and can see that ground truth is pretty messy. I would assume it's not correct to benchmark anything on such a noisy ground truth, is it? On this image ground truth is in shown with green bboxes and model prediction in blue. There are at least 4 penguins that are missing from ground truth but were detected by the model

Apr 14 '25 18:04 ArgoHA

there are some missed detections in the gt, although I think most images in the dataset are better than that one! :)

it means that for two models that are super close in mAP, we can't make a judgement call as to which is better. however, all the other models we benchmark in our RF100-VL paper get >50 mAP on this dataset, and in that issue with D-FINE, we got <2 mAP, so not at a level where we would consider that kind of noise relevant

Apr 14 '25 18:04 isaacrob-roboflow

can you say a bit more about how your repo is differentiated from theirs? why do you think we saw such poor results with their repo, both with and without pretrained weights?

Apr 14 '25 18:04 isaacrob-roboflow

I use coco pretrained weights, their architecture and loss functions. Everything else I wrote from scratch. Maybe there is an issue with warmup/lr. Maybe something else is broken, I can't say without debugging. I have different augs, including mosaic. OneCycleLR. EMA model is very different, theirs is too slow. Batch accum, although I didn't use it. A bunch of other features, but they should not make a lot of difference in the loss getting lower.

I can help you set up benchmarking with my repo if you want.

Apr 14 '25 18:04 ArgoHA

what happens if you use o365 pretrained weights?

sounds like there is a bug in their code somewhere. nice job building a version without whatever bug is in theirs!

our goal for rf100-vl benchmarking is to have a piece of code that we can point at a roboflow url and have training happen, which then dumps results on the test set, and we aggregate those results for all 100 datasets later. we use the same hparams for all datasets, with fixed bs 16 and 100 epochs, taking the best model by val loss / mAP. can you send some minimal code to reproduce the graphs you're showing above?

Apr 14 '25 18:04 isaacrob-roboflow

I use different dataset type. YOLO txt files. Splits in csv files (image names). All images in 1 folder (same for labels, but in labels folder). To get there I usually just upload all images and labels and then run split script. As I understand, you will have train valid and test in separate folders, so there should be a small script to transfer that to my repo's format.

After that it should be straight forward. Clone the repo, update the path in config file, run preprocess script if not all images are in jpg. Run script to get from your folders to csv files with splits (as I described above). Then run train script, get the model. In my pipeline I pick the best model by best avg(mAP50, f1). After model is trained, you can run export script to get tensorrt, openvino. For each format I have inference class. You can see more info in readme.

I can share the +- default config file (with hyperparams) I use for new tasks, you can go with it. If you write what exact data format you download with your code, I can write you a script to process it into my repo's format.

Apr 14 '25 19:04 ArgoHA

btw, why is rf-detr b so slow? In your benchmarks it is as fast as dfine m, but I see drastic difference. I inference your model like so:

import time

from PIL import Image
from rfdetr import RFDETRBase


class RFDETRModel:
    def __init__(self):
        self.conf = 0.5
        self.model = RFDETRBase(
            pretrain_weights="/home/argo/Desktop/Projects/aquarium/rf_detr_res/checkpoint_best_ema.pth",
        )

    def __call__(self, img):
        image = Image.open(img)
        t0 = time.perf_counter()

        detections = self.model.predict(image, threshold=self.conf)
        res = [
            {
                "labels": detections.class_id,
                "boxes": detections.xyxy,
                "scores": detections.confidence,
            }
        ]
        return res, (time.perf_counter() - t0) * 1000

D-FINEm is 2.5 times faster... Am I missing something? (here Torch is a D-FINE model, torch inference)

Apr 14 '25 19:04 ArgoHA

roboflow allows export in many formats, see https://roboflow.com/formats, if you're using yolo format that should be plug and play by changing the "coco" to "yolo" (or whatever the specific moniker is)

Apr 14 '25 19:04 isaacrob-roboflow

the reported inference time for all of these models is TRT compiled on T4 GPU. our repo in general has not been optimized for throughput in raw torch, future work for us to do

if you are interested in having us do that optimization work, a reproducible script allowing comparison to d-fine would be useful!

Apr 14 '25 19:04 isaacrob-roboflow

can I export to tensorrt with your repo to compare for myself (t4 is an old card, I would lke to have real world data if I decide to use rf-detr sometime in future)?

and you asked if o365 model trains, the answer is yes, but it convergese slower and may achieve worse results (or maybe it needs more epochs). I would not train o365 on such small datasets. In 95% I use coco weights for real worlds datasets.

Let me know if you need other input from my side. If you decide to use my repo, I can answer any questions you have after going through the readme file.

Apr 14 '25 19:04 ArgoHA

@ArgoHA o365 has a superset of classes and should transfer better to diverse datasets. especially since the highest performing coco checkpoints are derived from o365 checkpoints so to transfer from coco is to transfer from a finetuned variant of o365 anyway. we find for us o365 gives better performance than coco

surprised to see otherwise here .. suggests there may be another issue imo .. but we also don't want to go in and optimize every one else's models, just want to use what the authors suggest is the best configuration for transfer and benchmark that on a transfer dataset

Apr 14 '25 20:04 isaacrob-roboflow

export to trt can be done by using our onnx export method and then running benchmark on onnx via trtexec .. t4 is indeed an old card, but it's what's used in all these papers so we use it too! although we don't use it for our internal benchmarking of models

Apr 14 '25 20:04 isaacrob-roboflow

@ArgoHA I would like to see a script that allows me to supply a roboflow url and receive results. I can then spin up a bunch of compute on our end to run it on all the datasets. if you can't provide such a script, we'll have to write it on our end. we're pretty swamped here so can't guarantee how long it'll be till we get to it. although it now seems like we should prior to releasing the paper! :)

Apr 14 '25 20:04 isaacrob-roboflow

@isaacrob-roboflow here is the script that takes a roboflow url as input, then trains the model and saves results in your format. Details on how to run:

in your working dir put robo_train.py (link) and config.yaml (link)
git clone https://github.com/ArgoHA/custom_d_fine.git
download l and x models and put into custom_d_fine.git/pretrained with naming like dfine_{model_size}_{pretrained_dataset}, example: dfine_l_coco.pth or dfine_l_obj2coco.pth. Download here.
python robo_train.py

Some notes:

config.yaml has some generic configs that you can consider recommended
in robo_train.py I change some configs like model_name, dataset path, dataset labels, epochs, batch size. You can change or add other configs you want to change.
script will download your dataset, convert splitting format (I use csv files), train each model size, create usual output like model weights, metrics. And most importantly - your json file is stored in {dataset_name}/results_output/{model_name}_result.json. It contains mAP50 and mAP50-95 measured on a test folder, model name and dataset url

Let me know if you have questions.

Apr 21 '25 08:04 ArgoHA

@isaacrob-roboflow hey, any reply? I hope I did not waste my time here :)

May 02 '25 17:05 ArgoHA

Hi! This is on our list :) we are a very small team with a lot going on so I appreciate your patience as we finalize some other work first

May 02 '25 19:05 isaacrob-roboflow

hi @ArgoHA when you run in pytorch, do you call .export first? so before running inference, do model.eval() then model.export() ? there are some internals that get optimized only at export time which may be part of why you see it so slow

May 08 '25 20:05 isaacrob-roboflow

Hey @isaacrob-roboflow. I assume you are referring to the RF-DETR inference latency? If yes - I shared how I run inference in a message above.

Ok, now I tried this:

    def __init__(self):
        self.conf = 0.5
        self.model = RFDETRBase(pretrain_weights="path")
        self.model.eval()

I get: AttributeError: 'RFDETRBase' object has no attribute 'eval'

And it's clear why. RFDETR is not a "pytorch model", it's your custom class - here. I also see you logically doing self.model.model.eval() at the start of the predict script.

May 09 '25 05:05 ArgoHA

FYI: D-FINE is now available in the Transformers package v4.51.3-D-FINE-preview. Could be helpful for benchmarking.

May 09 '25 06:05 tuomastik

@tuomastik can it be trained via Transformers api? If no - then it can't help for benchmarking, as guys from Roboflow want to train model on dozen of dataset and average test metrics for each small dataset.

May 09 '25 09:05 ArgoHA

@tuomastik can it be trained via Transformers api?

I haven't tried, but it should be doable using their Trainer class or custom PyTorch training loop. More info:

https://huggingface.co/docs/transformers/en/tasks/object_detection
https://huggingface.co/docs/transformers/en/main_classes/trainer

May 09 '25 11:05 tuomastik