pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] Add ViT weights: RADIO

Open seefun opened this issue 9 months ago • 2 comments

https://github.com/NVlabs/RADIO

The code and model weights of paper [CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One has been released by Nvidia

RADIO , a new vision foundation model (actually a new vit pretrained weight), excels across visual domains, serving as a superior replacement for vision backbones. Integrating CLIP variants, DINOv2, and SAM through distillation, it preserves unique features like text grounding and segmentation correspondence.

image

seefun avatar May 14 '24 12:05 seefun

Does RADIO have ImageNet-1k heads?

NightMachinery avatar May 22 '24 02:05 NightMachinery

Does RADIO have ImageNet-1k heads?

I haven't seen it yet. But I notice the new RADIOv2.5 model is released, which merged knowledge from DFN CLIP, DINOv2, SigLIP, and SAM through multi-teacher distillation. It looks very practical in downstream task. https://github.com/NVlabs/RADIO/blob/main/RADIOv2.5_tech_report.md

seefun avatar Oct 12 '24 04:10 seefun