rf-detr
rf-detr copied to clipboard
model not converging and inference issue
Search before asking
- [x] I have searched the RF-DETR issues and found no similar bug report.
Bug
-
Looking at the metric plot seems like the model is not converging
-
also in the training script I haven't set any resolution and not sure to what does the code defaults the image size with feeding the model.
-
My guess here - -
-
Is RF detr here resizing ot 640x640 to my actual images - (3840x 1620)... which is causing too much shrinking on the image causing the player and ball really small to detect? if so how to fix it ?
-
Inference logs for some reason prints that its reinitializing the model head to 1 isntead of the models actual class as 2 .
-
Am I making any mistake in the inference code by any chance.(look at the inference logs above)
-
Inference code does give detection when sent the confidence threshold to - 0.25 but causing a lot of false positives too.
Info related to the setup -
- I am training on a A100 GPU - 80 GB . And the model.
- Its a basket-ball game dataset with 2 class as "players" and the "ball". More details can be seen in the _annotations.coco.json script about the dataset.
- I have attached the training script above for reference, along with the results.json and logs.txt and also the metric plot too.
- Dataset info - Training set - 40k images, Valid set = 18k, Test set=17k images.
- image info - each image is 3840x1620 (basically 4k images).
- Inference code is in infer.txt file
Any help is appreciated!!
Inference logs -
D:\rf-detr> python .\infer_detr.py
Model Type: custom
Checkpoint Folder: M0
Input Folder: D:\rf-detr\frames
Output Folder: D:\rf-detr\detr_eval_output\inference_output_M0_annotated
Output Video: D:\rf-detr\detr_eval_output\inference_output_M0_video.mp4
Confidence Threshold: 0.5
Video FPS: 30
Number of Classes: 2
Class Names: ['player', 'ball']
[Before] Allocated: 0.00 MB | Reserved: 0.00 MB
[After] Allocated: 0.00 MB | Reserved: 0.00 MB
Using custom checkpoint...
Loading model with 2 classes from D:\rf-detr\detr_train_output\M0\checkpoint_best_regular.pth
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
num_classes mismatch: pretrain weights has 1 classes, but your model has 2 classes
reinitializing detection head with 1 classes
loss_type=None was set in the config but it is unrecognised.Using the default loss: ForCausalLMLoss .
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the....
Environment
Environment:- Python - 3.9.19 torch - 2.5.1+cu121 RF-DETR version - 1.2.1 Cuda version - 12.1 cuDNN version - 90100 Platform - Windows 10
Minimal Reproducible Example
from pathlib import Path
from datetime import datetime
from itertools import product
from rfdetr import RFDETRMedium
from rfdetr.detr import RFDETRNano
dataset_dir = r"D:\datasets\object_detector\coco_format_ready"
output_dir = r"D:\rf-detr\detr_train_output"
sweep = [
dict(name="M1", focal_alpha=0.4, cls_loss_coef=1.4)
]
if __name__ == '__main__':
for cfg in sweep:
run_out = f"{output_dir}/{cfg['name']}"
model = RFDETRNano()
model.train(
dataset_dir=dataset_dir,
output_dir=run_out,
run=cfg["name"],
epochs=30,
batch_size=64,
lr=1e-4,
use_amp=True,
amp_dtype="bf16", # only for L40 or A100 type GPUs
gradient_checkpointing=False,
grad_accum_steps=1,
num_workers=4,
pin_memory=True,
persistent_workers=True,
prefetch_factor=4,
channels_last=True,
focal_alpha=cfg["focal_alpha"],
cls_loss_coef=cfg["cls_loss_coef"],
eval_interval=2, # Run evaluation every 2 epochs
tensorboard=True,
)
Additional
Are you willing to submit a PR?
- [ ] Yes, I'd like to help by submitting a PR!
If you don't specify a resolution it downsizes the images to 384x384 iirc for nano. Either use medium which is higher res or manually set a resolution. Note that setting a higher resolution will make the model slower. If you're going for a higher resolution, I'd use medium and manually set what feels appropriate to you. But we haven't really experimented with super high resolutions.
I see, some few clarification question here :
- does the medium side model support training on non square images(eg- 3840x1620 image size)?, If so, how can we pass it?
- Does the model also downsizes the image during the inference?
- Also, do you have comments on the inference class initialization issue?
D:\rf-detr> python .\infer_detr.py
Model Type: custom
Checkpoint Folder: M0
Input Folder: D:\rf-detr\frames
Output Folder: D:\rf-detr\detr_eval_output\inference_output_M0_annotated
Output Video: D:\rf-detr\detr_eval_output\inference_output_M0_video.mp4
Confidence Threshold: 0.5
Video FPS: 30
Number of Classes: 2
Class Names: ['player', 'ball']
[Before] Allocated: 0.00 MB | Reserved: 0.00 MB
[After] Allocated: 0.00 MB | Reserved: 0.00 MB
Using custom checkpoint...
Loading model with 2 classes from D:\rf-detr\detr_train_output\M0\checkpoint_best_regular.pth
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
num_classes mismatch: pretrain weights has 1 classes, but your model has 2 classes
reinitializing detection head with 1 classes
loss_type=None was set in the config but it is unrecognised.Using the default loss: ForCausalLMLoss .
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the....
Non square training thread https://github.com/roboflow/rf-detr/issues/275
Num classes initialization is an ongoing issue that seems to be with the print and not actually with the underlying model. Try inference and see if it works. Lotta threads on that
Yes, the model resizes during inference