3DUX-Net icon indicating copy to clipboard operation
3DUX-Net copied to clipboard

similarities between the weighted sum approach in self-attention and the convolution per-channel basis.

Open AN-AN-369 opened this issue 1 year ago • 1 comments

Your ideas are great! But I have a question, when introducing this part of "Volumetric Depth-wise Convolution with LKs:", put forward" Inspired by the idea of depth-wise convolution, we have found similarities between the weighted sum approach in self-attention and the convolution per-channel basis.", I did not find a clear explanation in the article, may I ask how to understand this sentence?

AN-AN-369 avatar Aug 07 '23 04:08 AN-AN-369

Thank you for your interest to our work. Great question! For swin transformer approach, the computation of self-attention within a window is completely similar to the convolution per-channel basis. For example, you have a window of 7x7 with swin transformer, it will further divided into sub-windows to compute the self-attention for more fine-grain details. However, they will be weighted-sum together and similar to the depthwise convolution approach (performing convolution each channel independely). That's the similarity between the swin transformer block and convolution block

leeh43 avatar Aug 20 '23 22:08 leeh43

I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.

BennettLandman avatar Aug 01 '24 16:08 BennettLandman