dinov2 Convert DINOv2 ViT Weights to ViT-Adapter

In the recently released segmentation notebook, a trained Mask2Former segmenter is loaded. In its strucutre, it's possible to see a ViT-Adapter is used as a backbone and not a standard ViT, which is what DINOv2 produces.

So my question is, how was that model trained? I'm assuming the weights produced by DINOv2 were loaded into a ViT-Adapter (via some sort of conversion) and then the Mask2Former structure was trained using mmsegmentation, but it's not clear how that was done.

Am I missing something? How was that conversion done?

Sep 29 '23 12:09 MatCorr

ViT-Adapter wraps around the DINOv2 model with injector and extractor modules, see the paper here so all you need to do is build the ViTAdapter model from here and pass in the DINOv2 backbone as the pretrained weights. In the DINOv2 paper in the segmentation section you can see they train the adapter weights and the head but keep the backbone frozen.

Oct 04 '23 21:10 dillonalaird

Ok, thanks!

One thing is still not clear to me, though. Do we have the script for training the Mask2Former model?

Oct 05 '23 16:10 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Oct 05 '23 21:10 dillonalaird

Thanks a bunch for the thoughtful response.

I had tried training through MMSegmentation but bumped into some odd errors, so I thought that maybe the training had been done in another way. Since you made it work, I'll go back to trying.

Oct 06 '23 10:10 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below:

Oct 17 '23 13:10 AlessioQuercia

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi，I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

Mar 11 '24 03:03 lilong-epfl

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below:

Yeah, the DINOv2 weights are slightly different from the ones expected by MMSegmentation / ViTAdapter. You are going to need to convert their labels / keys.

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi，I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

I never had that error, sorry. =/

Mar 11 '24 11:03 MatCorr

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

There are weights mismatches when loading the DINOv2 backbone state_dict in ViTAdapter. See below:

it seems like your dinov2_checkpoint use swiglufused as ffn_layer，but vitadapter use normal Mlp. Maybe you need to replace the Mlp layer with SwiGLUFFNFused layer in vitadapter

May 02 '24 17:05 hubhub086

It's run using MMLab, specifically MMSegmentation. You can follow the notebook here to load the mmsegmentation config file used to run the model. You may have to modify some of the configuration, I was able to train a smaller DINO backbone with ViT adapter and Mask2Former head but it took some time to get everything working.

Hi，I am try to train DINO backbone with ViT adapter, but I got "NotImplementedError: You must implement either the backward or vjp method for your custom autograd.Function to use it with backward mode AD." Error. It looks like some part of code is missing, Did you met the same issue? Thanks!

The class "MSDeformAttnFunction" in this repository seems to be missing the backward function. If you want to train the adapter, you can refer to the code in this link, which has the backward function as well.

May 03 '24 22:05 Vishwesh4

dinov2 dinov2 copied to clipboard

Convert DINOv2 ViT Weights to ViT-Adapter

dinov2
dinov2 copied to clipboard