emaad
emaad
Has anyone had success visualizing attention weights for images and text tokens. I'm really interested in seeing why the model is selecting tokens.
In the training loop we have: ``` imgs = imgs.to(device=args.device) logits, target = self.model(imgs) loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1)) loss.backward() ``` However, the output of the transformer is: ``` _,...
cosine schedule calculates the number of tokens which are UNMASKED