Phil Wang comments

Results 1479 comments of


                                            Phil Wang

trafficstars

Anyone tried to train this code with Imagenet from scratch ?

@KyleZheng1997 Hi Kyle, so there is one gotcha when training attention networks with Adam, and that is we exclude the parameters of the LayerNorm for weight decay. Or you can...

fastai compatibility

@lwomalley Hi Logan! I know of FastAI but not too familiar with their API What would it take to be compatible?

fastai compatibility

@lwomalley yup, but the problem is the distillation comes with an auxiliary loss that gets returned. Will FastAI know to add this to the main loss it calculates?

@tyoc213 yeah, it won't work, because that line needs to also add the distillation loss https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/distill.py#L120 i could return the logits on the first element of the tuple, and the...

fastai compatibility

@tyoc213 I see, so I'd have to store the auxiliary loss on the instance somewhere? and then in the callback it would be fetched and added to the main loss?

fastai compatibility

@tyoc213 do you have any code examples of how you are using `ViT` with FastAI?

Increase Performance

Try increasing your dimensions to 512 Also increase the k to 256 at very least

About flatening the patch

@yueruchen Hi Yifan! If you make sure that both your image sizes are divisible by the patch size, as long as you instantiate `ViT` with `image_size` as the maximum image...

About flatening the patch

```python import torch from vit_pytorch import ViT v = ViT( image_size = 512, patch_size = 32, num_classes = 1000, dim = 1024, depth = 6, heads = 8, mlp_dim =...

cls_token

the CLS token is passed through the layers of attention and aggregates information from the rest of the tokens as it makes its way up