Transformer-Explainability Is it possible to use this method for ViTs trained for regression?

I finetuned a model from ‘timm’ for a regression problem by setting ‘nb_classes’ to 1 and then using MSE loss. How can I use this repo to display a saliency map without distinct classes?

Oct 17 '24 06:10 thaotnguyen

I also have the same question for the authors.

Apr 07 '25 18:04 Ektagavas

I also have the same question for the authors.

Jun 19 '25 08:06 Cwyxx

Hello,

I am also trying to apply this method to models that were fine-tuned for regression tasks. These models use a custom Vision Transformer implementation from the RETFound repository (specifically the ViT-L architecture with 16×16 patch size).

I successfully loaded their weights using the vit_large_patch16_224() function from this repo (from ViT_LRP.py), setting num_classes=1.

To extract relevance maps, I then use an adapter that wraps the LRP class and calls generate_LRP() with index=None and method='transformer_attribution'. I also post-process the output to reshape the relevance scores to a [14,14] patch grid.

Do you think your method can apply correctly in this set up?

Nov 07 '25 18:11 SachaBors