3DUX-Net similarities between the weighted sum approach in self-attention and the convolution per-channel basis.

similarities between the weighted sum approach in self-attention and the convolution per-channel basis.

Open AN-AN-369 opened this issue 1 year ago • 1 comments

Your ideas are great! But I have a question, when introducing this part of "Volumetric Depth-wise Convolution with LKs:", put forward" Inspired by the idea of depth-wise convolution, we have found similarities between the weighted sum approach in self-attention and the convolution per-channel basis.", I did not find a clear explanation in the article, may I ask how to understand this sentence?

Aug 07 '23 04:08 AN-AN-369

Thank you for your interest to our work. Great question! For swin transformer approach, the computation of self-attention within a window is completely similar to the convolution per-channel basis. For example, you have a window of 7x7 with swin transformer, it will further divided into sub-windows to compute the self-attention for more fine-grain details. However, they will be weighted-sum together and similar to the depthwise convolution approach (performing convolution each channel independely). That's the similarity between the swin transformer block and convolution block

Aug 20 '23 22:08 leeh43

I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.

Aug 01 '24 16:08 BennettLandman

3DUX-Net 3DUX-Net copied to clipboard

similarities between the weighted sum approach in self-attention and the convolution per-channel basis.

3DUX-Net
3DUX-Net copied to clipboard