Attention weight score
Hi, I was reading your work and was wondering, how do you obtain the highest attention weight for the feature maps in figure 5? Do you just sum up the tensor along the channel dimension and sort that or do you use some other method? Thanks!
We use exactly the same method used by CAIN but plot only the highest activation feature. https://www.dropbox.com/s/b62wnroqdd5lhfc/AAAI-ChoiM.4773.pdf?dl=0
Hi, I have taken a look at the CAIN paper and also the repository, however, there is no explanation as to how they get the attention score. How would you obtain the attention score in the first place?
During training, we compute a separate attention score for each channel dimension in each layer. To visualize, we pick a layer and compute the attention weights for all the channels, and plot channel maps having largest attention values as RGB images.
Can you elaborate on how would you compute this attention score? Also, if you compute an attention value for each channel (say for example with a layer with 64 channels), would you basically take the 3 channels with the largest activations and concatenate them to obtain the RGB images or would you use one channel and concatenate two empty channels of 1s?