vit-pytorch icon indicating copy to clipboard operation
vit-pytorch copied to clipboard

MAE using pretrained VIT

Open Songloading opened this issue 3 years ago • 3 comments

Hi There,

I am currently trying to fine-tune an MAE based on pretrained VIT from timm. However, when I do:

v = timm.create_model('vit_base_patch16_224', pretrained=True)
num_ftrs = v.head.in_features
v.head = nn.Linear(num_ftrs, 2) 
model = MAE(
    encoder = v,
    masking_ratio = 0.75,   # the paper recommended 75% masked patches
    decoder_dim = 512,      # paper showed good results with just 512
    decoder_depth = 6       # anywhere from 1 to 8
)

I got "AttributeError: 'VisionTransformer' object has no attribute 'pos_embedding'" It seems that timm model is not compatible with the MAE implementation. Can this be easily fixed or I will have to change the internal implementation of MAE?

Songloading avatar Jan 26 '22 08:01 Songloading

@Songloading i have no idea! i'm not really familiar with timm - perhaps you can ask Ross about it?

lucidrains avatar Jan 26 '22 17:01 lucidrains

@lucidrains ok. Any ideas on what else pretrained VIT besides their implementation or pretrained MAE?

Songloading avatar Jan 26 '22 19:01 Songloading

Did you solve the issue? @Songloading

mw9385 avatar Jul 17 '23 07:07 mw9385