rf-detr RF-DETR on Remote sensing

Hi Team , Thanks for creating a great GitHub repo with loads of features.

I was wishing to explore if the backbone of RF- DETR be updated/ altered to use a Vision foundation model for feature extraction. I was thinking of using Panopticon ( https://github.com/Panopticon-FM/panopticon ) which is also built on dinov2 base.

Any guidance / step by step process on how I could proceed with the integration process would be wonderful.

I was wishing to use the altered architecture for Object detection using multimodalremote sensing data ( Optical , Multispectral , hyperspectral , SAR )

Panopticon is trained on extensive remote sensing data ( Optical , Multispectral , hyperspectral , SAR ) thus usage of panopticon weights for feature extraction would be apt for remote sensing object detection use cases .

KR Anirban

Aug 25 '25 05:08 ROYUNSW

Hi, just for your information: the new DINOv3 release includes a model pretrained on large-scale Remote Sensing / Satellite imagery (a dataset of 493 million images). Since RF-DETR was originally built on DINOv2, this new version might be useful for your work.

https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-models

Aug 25 '25 09:08 XiphosF

@ROYUNSW i would suggest training the model as is on your data before trying to modify it :) our backbone is dinov2 which IS a vision foundation model, but we also do work to make it faster and we do additional pretraining on it that might be costly for many third parties to replicate that significantly improves object detection performance. More specific details on these will be released in our paper which is under development

Aug 25 '25 14:08 isaacrob-roboflow

@ROYUNSW i would suggest training the model as is on your data before trying to modify it :) our backbone is dinov2 which IS a vision foundation model, but we also do work to make it faster and we do additional pretraining on it that might be costly for many third parties to replicate that significantly improves object detection performance. More specific details on these will be released in our paper which is under development

@isaacrob-roboflow , Thanks a lot for your response and guidance . The idea of using a foundation model attuned to remote sensing ( Panopticon ) is to have a feature extractor pretrained weights relevant for the domain .

Aug 27 '25 11:08 ROYUNSW

Hi, just for your information: the new DINOv3 release includes a model pretrained on large-scale Remote Sensing / Satellite imagery (a dataset of 493 million images). Since RF-DETR was originally built on DINOv2, this new version might be useful for your work.

https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-models

@XiphosF , Thanks a lot for your guidance . This repo seems to be hot off the press. Let me explore the repo and get back to you with further questions.

Aug 27 '25 11:08 ROYUNSW

Hey @ROYUNSW! With LightlyTrain you can leverage knowledge distillation from DINOv3 together with RF-DETR or any other vision backbone: https://github.com/lightly-ai/lightly-train

Oct 08 '25 09:10 liopeer

I would not recommend doing that as our backbone has additionally been pretrained on objects365 as part of end to end training with the rest of the model. Distilling a different model into it at that point would effectively undo that pretraining, and that pretraining significantly improves performance on downstream datasets.

Oct 08 '25 13:10 isaacrob-roboflow

Hey @ROYUNSW! With LightlyTrain you can leverage knowledge distillation from DINOv3 together with RF-DETR or any other vision backbone: https://github.com/lightly-ai/lightly-train

Hi @liopeer , I have successfully used LightlyTrain Knowledge Distrillation method to knoledge distill from Dinov3 to YoloV11 and the results are impressive.

Oct 11 '25 05:10 ROYUNSW

Paper is live if of interest :) https://arxiv.org/abs/2511.09554

Nov 13 '25 16:11 isaacrob-roboflow