RF-DETR on Remote sensing
Hi Team , Thanks for creating a great GitHub repo with loads of features.
I was wishing to explore if the backbone of RF- DETR be updated/ altered to use a Vision foundation model for feature extraction. I was thinking of using Panopticon ( https://github.com/Panopticon-FM/panopticon ) which is also built on dinov2 base.
Any guidance / step by step process on how I could proceed with the integration process would be wonderful.
I was wishing to use the altered architecture for Object detection using multimodalremote sensing data ( Optical , Multispectral , hyperspectral , SAR )
Panopticon is trained on extensive remote sensing data ( Optical , Multispectral , hyperspectral , SAR ) thus usage of panopticon weights for feature extraction would be apt for remote sensing object detection use cases .
KR Anirban
Hi, just for your information: the new DINOv3 release includes a model pretrained on large-scale Remote Sensing / Satellite imagery (a dataset of 493 million images). Since RF-DETR was originally built on DINOv2, this new version might be useful for your work.
https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-models
@ROYUNSW i would suggest training the model as is on your data before trying to modify it :) our backbone is dinov2 which IS a vision foundation model, but we also do work to make it faster and we do additional pretraining on it that might be costly for many third parties to replicate that significantly improves object detection performance. More specific details on these will be released in our paper which is under development
@ROYUNSW i would suggest training the model as is on your data before trying to modify it :) our backbone is dinov2 which IS a vision foundation model, but we also do work to make it faster and we do additional pretraining on it that might be costly for many third parties to replicate that significantly improves object detection performance. More specific details on these will be released in our paper which is under development
@isaacrob-roboflow , Thanks a lot for your response and guidance . The idea of using a foundation model attuned to remote sensing ( Panopticon ) is to have a feature extractor pretrained weights relevant for the domain .
Hi, just for your information: the new DINOv3 release includes a model pretrained on large-scale Remote Sensing / Satellite imagery (a dataset of 493 million images). Since RF-DETR was originally built on DINOv2, this new version might be useful for your work.
https://github.com/facebookresearch/dinov3?tab=readme-ov-file#pretrained-models
@XiphosF , Thanks a lot for your guidance . This repo seems to be hot off the press. Let me explore the repo and get back to you with further questions.
Hey @ROYUNSW! With LightlyTrain you can leverage knowledge distillation from DINOv3 together with RF-DETR or any other vision backbone: https://github.com/lightly-ai/lightly-train
I would not recommend doing that as our backbone has additionally been pretrained on objects365 as part of end to end training with the rest of the model. Distilling a different model into it at that point would effectively undo that pretraining, and that pretraining significantly improves performance on downstream datasets.
Hey @ROYUNSW! With LightlyTrain you can leverage knowledge distillation from DINOv3 together with RF-DETR or any other vision backbone: https://github.com/lightly-ai/lightly-train
Hi @liopeer , I have successfully used LightlyTrain Knowledge Distrillation method to knoledge distill from Dinov3 to YoloV11 and the results are impressive.
Paper is live if of interest :) https://arxiv.org/abs/2511.09554