rf-detr model not converging and inference issue

trafficstars

Search before asking

[x] I have searched the RF-DETR issues and found no similar bug report.

Bug

Looking at the metric plot seems like the model is not converging
also in the training script I haven't set any resolution and not sure to what does the code defaults the image size with feeding the model.
My guess here - -
Is RF detr here resizing ot 640x640 to my actual images - (3840x 1620)... which is causing too much shrinking on the image causing the player and ball really small to detect? if so how to fix it ?
Inference logs for some reason prints that its reinitializing the model head to 1 isntead of the models actual class as 2 .
Am I making any mistake in the inference code by any chance.(look at the inference logs above)
Inference code does give detection when sent the confidence threshold to - 0.25 but causing a lot of false positives too.

Info related to the setup -

I am training on a A100 GPU - 80 GB . And the model.
Its a basket-ball game dataset with 2 class as "players" and the "ball". More details can be seen in the _annotations.coco.json script about the dataset.
I have attached the training script above for reference, along with the results.json and logs.txt and also the metric plot too.
Dataset info - Training set - 40k images, Valid set = 18k, Test set=17k images.
image info - each image is 3840x1620 (basically 4k images).
Inference code is in infer.txt file

Any help is appreciated!!

Inference logs -

 D:\rf-detr> python .\infer_detr.py

Model Type: custom
Checkpoint Folder: M0
Input Folder: D:\rf-detr\frames
Output Folder: D:\rf-detr\detr_eval_output\inference_output_M0_annotated
Output Video: D:\rf-detr\detr_eval_output\inference_output_M0_video.mp4
Confidence Threshold: 0.5
Video FPS: 30
Number of Classes: 2
Class Names: ['player', 'ball']

[Before] Allocated: 0.00 MB | Reserved: 0.00 MB
[After]  Allocated: 0.00 MB | Reserved: 0.00 MB
Using custom checkpoint...
Loading model with 2 classes from D:\rf-detr\detr_train_output\M0\checkpoint_best_regular.pth
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
num_classes mismatch: pretrain weights has 1 classes, but your model has 2 classes
reinitializing detection head with 1 classes
⁠ loss_type=None ⁠ was set in the config but it is unrecognised.Using the default loss: ⁠ ForCausalLMLoss ⁠.
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the....

Environment

Environment:- Python - 3.9.19 torch - 2.5.1+cu121 RF-DETR version - 1.2.1 Cuda version - 12.1 cuDNN version - 90100 Platform - Windows 10

Minimal Reproducible Example

from pathlib import Path
from datetime import datetime
from itertools import product
from rfdetr import RFDETRMedium
from rfdetr.detr import RFDETRNano


dataset_dir = r"D:\datasets\object_detector\coco_format_ready"
output_dir = r"D:\rf-detr\detr_train_output"

sweep = [
    dict(name="M1", focal_alpha=0.4, cls_loss_coef=1.4)
    ]


if __name__ == '__main__':
    for cfg in sweep:
        run_out = f"{output_dir}/{cfg['name']}"
        model = RFDETRNano()
        model.train(
            dataset_dir=dataset_dir,
            output_dir=run_out,
            run=cfg["name"],
            epochs=30,
            batch_size=64,
            lr=1e-4,
            use_amp=True,
            amp_dtype="bf16",   # only for L40 or A100 type GPUs
            gradient_checkpointing=False,
            grad_accum_steps=1,
            num_workers=4,
            pin_memory=True,
            persistent_workers=True,
            prefetch_factor=4,
            channels_last=True,
            focal_alpha=cfg["focal_alpha"],
            cls_loss_coef=cfg["cls_loss_coef"],
            eval_interval=2,  # Run evaluation every 2 epochs
            tensorboard=True,
        )

Additional

infer.txt

_annotations.coco.json

log.txt results.json

Are you willing to submit a PR?

[ ] Yes, I'd like to help by submitting a PR!

Aug 10 '25 07:08 vraj130

If you don't specify a resolution it downsizes the images to 384x384 iirc for nano. Either use medium which is higher res or manually set a resolution. Note that setting a higher resolution will make the model slower. If you're going for a higher resolution, I'd use medium and manually set what feels appropriate to you. But we haven't really experimented with super high resolutions.

Aug 12 '25 15:08 isaacrob-roboflow

I see, some few clarification question here :

does the medium side model support training on non square images(eg- 3840x1620 image size)?, If so, how can we pass it?
Does the model also downsizes the image during the inference?
Also, do you have comments on the inference class initialization issue?

D:\rf-detr> python .\infer_detr.py

Model Type: custom
Checkpoint Folder: M0
Input Folder: D:\rf-detr\frames
Output Folder: D:\rf-detr\detr_eval_output\inference_output_M0_annotated
Output Video: D:\rf-detr\detr_eval_output\inference_output_M0_video.mp4
Confidence Threshold: 0.5
Video FPS: 30
Number of Classes: 2
Class Names: ['player', 'ball']

[Before] Allocated: 0.00 MB | Reserved: 0.00 MB
[After]  Allocated: 0.00 MB | Reserved: 0.00 MB
Using custom checkpoint...
Loading model with 2 classes from D:\rf-detr\detr_train_output\M0\checkpoint_best_regular.pth
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
num_classes mismatch: pretrain weights has 1 classes, but your model has 2 classes
reinitializing detection head with 1 classes
⁠ loss_type=None ⁠ was set in the config but it is unrecognised.Using the default loss: ⁠ ForCausalLMLoss ⁠.
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the....

Aug 13 '25 01:08 vraj130

Non square training thread https://github.com/roboflow/rf-detr/issues/275

Num classes initialization is an ongoing issue that seems to be with the print and not actually with the underlying model. Try inference and see if it works. Lotta threads on that

Yes, the model resizes during inference

Aug 13 '25 15:08 isaacrob-roboflow

rf-detr rf-detr copied to clipboard

model not converging and inference issue

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

rf-detr
rf-detr copied to clipboard