3DUX-Net
3DUX-Net copied to clipboard
similarities between the weighted sum approach in self-attention and the convolution per-channel basis.
Your ideas are great! But I have a question, when introducing this part of "Volumetric Depth-wise Convolution with LKs:", put forward" Inspired by the idea of depth-wise convolution, we have found similarities between the weighted sum approach in self-attention and the convolution per-channel basis.", I did not find a clear explanation in the article, may I ask how to understand this sentence?
Thank you for your interest to our work. Great question! For swin transformer approach, the computation of self-attention within a window is completely similar to the convolution per-channel basis. For example, you have a window of 7x7 with swin transformer, it will further divided into sub-windows to compute the self-attention for more fine-grain details. However, they will be weighted-sum together and similar to the depthwise convolution approach (performing convolution each channel independely). That's the similarity between the swin transformer block and convolution block
I am closing the older bug reports as these were missed. We are now better tracking reports across the organization. Please re-open if this continues to be a blocker.