UNI icon indicating copy to clipboard operation
UNI copied to clipboard

How was figure 3e generated in the paper?

Open AditMeh opened this issue 11 months ago • 2 comments

Screenshot 2024-03-25 at 8 51 32 PM

I used something similar to this to extract the attention scores for the penultimate layer, as explained in the caption for figure 3e. However, I found that the attention maps I'm getting are a lot less "intuitive" compared to the ones shown in this figure.

Was this figure generated with a fine-tuned UNI model on the ROI level task or is it just showing the attention maps of the SSL model (no fine-tuning)?

Also, are the 448^2, 896^2 and 1344^2 attention maps computed by concatenating the attention map for each non-overlapping 224^2 patch together?

AditMeh avatar Mar 26 '24 00:03 AditMeh