dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Convert DINOv2 ViT Weights to ViT-Adapter

Open MatCorr opened this issue 1 year ago • 9 comments

In the recently released segmentation notebook, a trained Mask2Former segmenter is loaded. In its strucutre, it's possible to see a ViT-Adapter is used as a backbone and not a standard ViT, which is what DINOv2 produces.

So my question is, how was that model trained? I'm assuming the weights produced by DINOv2 were loaded into a ViT-Adapter (via some sort of conversion) and then the Mask2Former structure was trained using mmsegmentation, but it's not clear how that was done.

Am I missing something? How was that conversion done?

MatCorr avatar Sep 29 '23 12:09 MatCorr

ViT-Adapter wraps around the DINOv2 model with injector and extractor modules, see the paper here so all you need to do is build the ViTAdapter model from here and pass in the DINOv2 backbone as the pretrained weights. In the DINOv2 paper in the segmentation section you can see they train the adapter weights and the head but keep the backbone frozen.

dillonalaird avatar Oct 04 '23 21:10 dillonalaird

Ok, thanks!

One thing is still not clear to me, though. Do we have the script for training the Mask2Former model?

MatCorr avatar Oct 05 '23 16:10 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

dillonalaird avatar Oct 05 '23 21:10 dillonalaird

Thanks a bunch for the thoughtful response.

I had tried training through MMSegmentation but bumped into some odd errors, so I thought that maybe the training had been done in another way. Since you made it work, I'll go back to trying.

MatCorr avatar Oct 06 '23 10:10 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below: image

AlessioQuercia avatar Oct 17 '23 13:10 AlessioQuercia

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi,I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

lilong-epfl avatar Mar 11 '24 03:03 lilong-epfl

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below: image

Yeah, the DINOv2 weights are slightly different from the ones expected by MMSegmentation / ViTAdapter. You are going to need to convert their labels / keys.

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi,I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

I never had that error, sorry. =/

MatCorr avatar Mar 11 '24 11:03 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below: image

it seems like your dinov2_checkpoint use swiglufused as ffn_layer,but vitadapter use normal Mlp. Maybe you need to replace the Mlp layer with SwiGLUFFNFused layer in vitadapter

hubhub086 avatar May 02 '24 17:05 hubhub086

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi,I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

The class "MSDeformAttnFunction" in this repository seems to be missing the backward function. If you want to train the adapter, you can refer to the code in this link, which has the backward function as well.

Vishwesh4 avatar May 03 '24 22:05 Vishwesh4