Phil Wang

Results 812 comments of Phil Wang

It only contains the masked patches - to get back the whole image will take some more code. You'd need to unsort the masked and unmasked patches together, and then...

Hi Rhinigtas! Could you show what your full training script looks like? Perhaps I can spot the error more easily that way

@xinchenduobian sure, try Deit, or any of the models that have some hierarchical pooling mixed in

@marcomameli1992 hi Marco, you just need to modify ViT to have a return statement here https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L123 for the embeddings i guess i could add this, but i don't want to...

@marcomameli1992 what do you mean by the positional embedding? the absolute positional embeddings are added at the beginning before it is fed through the attention layers, and can be accessed...

@marcomameli1992 actually, let me just write up a layer extractor that can wrap the ViT and return all these intermediates, similar to https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/recorder.py

@marcomameli1992 does this work for you? https://github.com/lucidrains/vit-pytorch/tree/0.25.1#accessing-embeddings

@yzb1997 hey, this may be helpful https://github.com/lucidrains/vit-pytorch/issues/169

@dnecho ohh yes, i'm actually not so sure about that - you may be right that it isn't necessary for the unmasked tokens

@dnecho it probably wouldn't hurt to keep it the way it is