On-Improving-Adversarial-Transferability-of-Vision-Transformers icon indicating copy to clipboard operation
On-Improving-Adversarial-Transferability-of-Vision-Transformers copied to clipboard

Bug in TransformerHead Implementation

Open LukasStruppek opened this issue 1 year ago • 0 comments

Hi,

thanks for the great paper, really enjoyed reading it. I am currently trying to build upon your work and re-implement the refinement module. If I understand the code correctly, the refinement module is defined as TransformerHead in the file deit_ensemble.py.

I think there might be a bug in the implementation. More specifically, the convolutional components are defined as following in line 27:

# To process patches
self.conv = nn.Conv2d(self.token_dim, self.token_dim, kernel_size=3, stride=stride, padding=1, bias=False)    
self.bn = nn.BatchNorm2d(self.token_dim)    
self.conv = nn.Conv2d(self.token_dim, self.token_dim, kernel_size=3, stride=1, padding=1, bias=False)    
self.bn = nn.BatchNorm2d(self.token_dim)

It seems that the second conv layer and bn layer overwrite the first two layers. Later in the forward method in line 50, this leads to the fact that the feature maps are processed twice by the same conv layer and bn layer:

patch_tokens = rearrange(patch_tokens, 'b (h w) d -> b d h w', h=size, w=size)  
features = F.relu(self.bn(self.conv(patch_tokens))) 
features = self.bn(self.conv(features))  

Is this behavior intended, do I misunderstand some parts of the implementation, or is it simply a bug?

Looking forward to hearing from you.

Best, Lukas

LukasStruppek avatar Apr 14 '23 09:04 LukasStruppek