Phil Wang

Results 1505 comments of Phil Wang

@EelcoHoogendoorn oh got it! yea, i think there's a couple ways to do depthwise conv, fixing this!

@EelcoHoogendoorn ok, let me know if https://github.com/lucidrains/vit-pytorch/releases/tag/0.16.13 looks good to you

@Tato14 Hi Joan! Seems like the approach came from https://arxiv.org/pdf/2005.00928.pdf I'll have to read it after I get through my queue of papers this week to see how difficult it...

@Tato14 the naive attention map for individual layers is this variable `attn` https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py#L56

@abeyang00 Hi Abe! So actually ViT already supports images of different sizes, as long as the height and width is divisible by the patch size, and both height and width...

@abeyang00 the image size there is just the max image size along one dimension - on forward, it still calculates the number of tokens and positionally embeds it correctly https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L107

@hussam789 sounds good! in 0.7.6 you can do ```python import torch from vit_pytorch.t2t import T2TViT from performer_pytorch import Performer performer = Performer( dim = 512, depth = 2, heads =...

You just need the image size to be max(height, width)

Ross framework should allow you to use ViT and download the pretrained models with one command