Phil Wang

Results 1479 comments of Phil Wang
trafficstars

@KyleZheng1997 Hi Kyle, so there is one gotcha when training attention networks with Adam, and that is we exclude the parameters of the LayerNorm for weight decay. Or you can...

@lwomalley Hi Logan! I know of FastAI but not too familiar with their API What would it take to be compatible?

@lwomalley yup, but the problem is the distillation comes with an auxiliary loss that gets returned. Will FastAI know to add this to the main loss it calculates?

@tyoc213 yeah, it won't work, because that line needs to also add the distillation loss https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/distill.py#L120 i could return the logits on the first element of the tuple, and the...

@tyoc213 I see, so I'd have to store the auxiliary loss on the instance somewhere? and then in the callback it would be fetched and added to the main loss?

@tyoc213 do you have any code examples of how you are using `ViT` with FastAI?

Try increasing your dimensions to 512 Also increase the k to 256 at very least

@yueruchen Hi Yifan! If you make sure that both your image sizes are divisible by the patch size, as long as you instantiate `ViT` with `image_size` as the maximum image...

```python import torch from vit_pytorch import ViT v = ViT( image_size = 512, patch_size = 32, num_classes = 1000, dim = 1024, depth = 6, heads = 8, mlp_dim =...

the CLS token is passed through the layers of attention and aggregates information from the rest of the tokens as it makes its way up