emaad issues

Repositories
Issues
Comments

Results 3 issues of


                                            emaad

Visualizing Attention Map

Has anyone had success visualizing attention weights for images and text tokens. I'm really interested in seeing why the model is selecting tokens.

Isn't loss only supposed to be calculated on masked tokens?

In the training loop we have: ``` imgs = imgs.to(device=args.device) logits, target = self.model(imgs) loss = F.cross_entropy(logits.reshape(-1, logits.size(-1)), target.reshape(-1)) loss.backward() ``` However, the output of the transformer is: ``` _,...

fix masking formula bug and only calculate loss on masked tokens

cosine schedule calculates the number of tokens which are UNMASKED