Phil Wang comments

Results 812 comments of


Phil Wang

MAE image recon

It only contains the masked patches - to get back the whole image will take some more code. You'd need to unsort the masked and unmasked patches together, and then...

Vit MAE reconstruction size mismatch

Hi Rhinigtas! Could you show what your full training script looks like? Perhaps I can spot the error more easily that way

Training in cifar100

@xinchenduobian sure, try Deit, or any of the models that have some hierarchical pooling mixed in

Hot to get the features and positional embeddig information

@marcomameli1992 hi Marco, you just need to modify ViT to have a return statement here https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L123 for the embeddings i guess i could add this, but i don't want to...

Hot to get the features and positional embeddig information

@marcomameli1992 what do you mean by the positional embedding? the absolute positional embeddings are added at the beginning before it is fed through the attention layers, and can be accessed...

Hot to get the features and positional embeddig information

@marcomameli1992 actually, let me just write up a layer extractor that can wrap the ViT and return all these intermediates, similar to https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/recorder.py

Hot to get the features and positional embeddig information

@marcomameli1992 does this work for you? https://github.com/lucidrains/vit-pytorch/tree/0.25.1#accessing-embeddings

mae output normal size images

@yzb1997 hey, this may be helpful https://github.com/lucidrains/vit-pytorch/issues/169

MAE decoder pos_emb

@dnecho ohh yes, i'm actually not so sure about that - you may be right that it isn't necessary for the unmasked tokens

MAE decoder pos_emb

@dnecho it probably wouldn't hurt to keep it the way it is