efficient-attention icon indicating copy to clipboard operation
efficient-attention copied to clipboard

How to replicate attention maps in object detection

Open chandlerbing65nm opened this issue 2 years ago • 5 comments

Can you share the code on how to visualize attention maps in object detection like the one shown in your paper?

image

chandlerbing65nm avatar Feb 25 '22 07:02 chandlerbing65nm

Hi Chandler,

The visualization code was inside the code base of my company at that time. Because it was not part of this open-source project, I believe they will not release it. (I also no longer have access to it since I left the company.)

The logic is very simple though. We were visualization each channel in keys. For keys of shape [n, d_k, h, w], we slice it to n * d_k tensors each of shape [1, 1, h, w]. Since we were visualizing the softmax variant, each element is in the range (0, 1), which was easy to paint as a greyscale image.

cmsflash avatar Feb 25 '22 18:02 cmsflash

Hi Chandler,

The visualization code was inside the code base of my company at that time. Because it was not part of this open-source project, I believe they will not release it. (I also no longer have access to it since I left the company.)

The logic is very simple though. We were visualization each channel in keys. For keys of shape [n, d_k, h, w], we slice it to n * d_k tensors each of shape [1, 1, h, w]. Since we were visualizing the softmax variant, each element is in the range (0, 1), which was easy to paint as a greyscale image.

@cmsflash In the image above (Figure 3 in paper), the description is that it is the visualization of attention maps from the efficient attention module. Yet, you mentioned here that the visualization is only done only at the keys.

I thought you visualized the attention maps from the output of the module.

chandlerbing65nm avatar Feb 28 '22 11:02 chandlerbing65nm

@cmsflash In the image above (Figure 3 in paper), the description is that it is the visualization of attention maps from the efficient attention module. Yet, you mentioned here that the visualization is only done only at the keys.

I thought you visualized the attention maps from the output of the module.

The description says the figure is visualizing the "global attention maps" from the efficient attention module. The "global attention maps" are the individual channels in keys.

cmsflash avatar Feb 28 '22 21:02 cmsflash

visualization is only done only at the keys

@cmsflash I'm confused here. Isn't the global attention can only be extracted when we use softmax(QK.T/sqrt(dk))V (or variations of it) ?

If only the channels at the keys are visualized, then it is just the spatial information from the input image, no attention has been extracted yet.

chandlerbing65nm avatar Mar 01 '22 02:03 chandlerbing65nm

The attention maps generated from QK are the pixel-wise attention maps. In our terminology, the "global attention maps" are the individual channels in K. Please check Section 3.4 in the paper for the reasoning behind the terminology.

cmsflash-pony avatar Mar 01 '22 21:03 cmsflash-pony