Phil Wang comments

Results 1505 comments of


                                            Phil Wang

Depthwise conv definition

@EelcoHoogendoorn oh got it! yea, i think there's a couple ways to do depthwise conv, fixing this!

Depthwise conv definition

@EelcoHoogendoorn ok, let me know if https://github.com/lucidrains/vit-pytorch/releases/tag/0.16.13 looks good to you

Attention maps

@Tato14 Hi Joan! Seems like the approach came from https://arxiv.org/pdf/2005.00928.pdf I'll have to read it after I get through my queue of papers this week to see how difficult it...

Attention maps

@Tato14 the naive attention map for individual layers is this variable `attn` https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit_pytorch.py#L56

what if images are different size each iteration?

@abeyang00 Hi Abe! So actually ViT already supports images of different sizes, as long as the height and width is divisible by the patch size, and both height and width...

what if images are different size each iteration?

@abeyang00 the image size there is just the max image size along one dimension - on forward, it still calculates the number of tokens and positionally embeds it correctly https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L107

T2TViT Performer backbone

@hussam789 sounds good! in 0.7.6 you can do ```python import torch from vit_pytorch.t2t import T2TViT from performer_pytorch import Performer performer = Performer( dim = 512, depth = 2, heads =...

How to handle rectangle images?

You just need the image size to be max(height, width)

i use it to video classification. when there has hundreds of frames ,How to deal with it?

https://github.com/lucidrains/TimeSformer-pytorch

What if I want to do imagenet transfer learning on my own data set

Ross framework should allow you to use ViT and download the pretrained models with one command