dinov3 icon indicating copy to clipboard operation
dinov3 copied to clipboard

SAT-493M pre-trained segmentation head weights are not provided

Open qvq521 opened this issue 5 months ago • 7 comments

When using the SAT-493M pre-trained backbone_weights, I found that the repository does not seem to provide the segmentation head weights pre-trained based on this dataset. Thanks!

qvq521 avatar Aug 16 '25 04:08 qvq521

Which pretrained head are you interested in? Eg there is not a SAT493M on coco.

The geo bench heads are quick to train with eg TerraTorch and the licensing for iSAID, DIOR does not allow for releasing models trained on the imagery as they have academic only licenses.

The updated canopy height model on the expanded SatLidar dataset will be released at a later date.

I am happy to help provide suggestions on how to recreate any of the models, though!

JohnMBrandt avatar Aug 16 '25 13:08 JohnMBrandt

Did you use Mask2Former head for segmentation? @JohnMBrandt

aselimc avatar Aug 16 '25 13:08 aselimc

No all the geospatial benchmarks use DPT or UperNet as noted in the appendix!

JohnMBrandt avatar Aug 16 '25 13:08 JohnMBrandt

I really want to know whether the SAT-493M model supports 512×512 remote sensing image input without being resized to 224×224 by the image processor. Is it possible to directly modify the preprocessor_config.json of the SAT-493M model to 512×512? Thank you very much for your answer.

kb077 avatar Aug 26 '25 14:08 kb077

Hi @kb077 -- because DINOv3 uses rope embeddings, it can take in images of any resolution -- indeed the SAT493 model had high resolution fine tuning at 512x512 resolution. You can definitely adjust it to take in 512x512 images while keeping the backbone frozen. DINOv3 is SOTA on iSAID, which trains on 896x896 patches, with a frozen backbone.

JohnMBrandt avatar Aug 26 '25 15:08 JohnMBrandt

Thank you very much for your reply!

kb077 avatar Aug 27 '25 03:08 kb077

Has anyone worked on multi class segmentation on satellite images ( high resolution & Low resolution), currently im working on Urban Informal (Slum) detection, as it will be pixel wise classification how can i utlize DINOv3 for my downstream task

Rupesh4604 avatar Sep 14 '25 18:09 Rupesh4604