pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] Visualize gradient maps for attention based network

Open AmbiTyga opened this issue 4 years ago • 3 comments

Recently facebook research team developed a method called DINO, as I was going through the repository, I found that there's a way to visualize the working of neural network(similar to Grad-CAM). For this to implement we need to add some methods in VisionTransformer class in timm.models.vision_transformer. I would like you to allow me to do these changes. For reference go to the below link: https://github.com/facebookresearch/dino/blob/main/vision_transformer.py

Methods to append from this file:

  • interpolate_pos_encoding
  • forward_selfattention
  • forward_return_n_last_blocks

AmbiTyga avatar May 03 '21 04:05 AmbiTyga

@AmbiTyga that adds a significant amount of non-trivial code to the base model for a fairly specific feature, considering that there are now vit/deit, pit, tnt, swin, soon cait and others as well... it's not a scalable or maintainable approach.

If someone came up with a flexible hook based wrapper/adapter approach that could support each of the vision transformers here without major additions to the base model (just some metadata), i'd accept that.

rwightman avatar May 03 '21 17:05 rwightman

I should also add that I do have plans to add feature extraction for the vit networks, like I have for the convnets, where activations of internal transformer blocks can be extracted. It isn't at the top of my priority list right now.

rwightman avatar May 03 '21 18:05 rwightman

I am working on a utility method as well as on a module that could cover this up for every image models(even non-attention based), please allow me to create a PR for this.

AmbiTyga avatar May 05 '21 16:05 AmbiTyga