pytorch-deep-learning Interpretation of patches for ViT

Interpretation of patches for ViT

Open nick-konovalchuk opened this issue 1 year ago • 0 comments

Imo the "default" first steps for an image going through ViT-B/16 are

Creating patches with torch.nn.Unpool
Doing linear projection with torch.nn.Linear

The notebook implements a somewhat scaled down hybrid approach which is NOT equivalent. If you check the appendix, their hybrids are ResNet X + ViT Y, meaning they're taking the output feature maps from a ResNet as inputs for a ViT

Nov 15 '23 22:11 nick-konovalchuk

pytorch-deep-learning pytorch-deep-learning copied to clipboard

Interpretation of patches for ViT

pytorch-deep-learning
pytorch-deep-learning copied to clipboard