Weiqian Chen issues

Repositories
Issues
Comments

Results 3 issues of


                                            Weiqian Chen

attention = torch.softmax(energy / (self.embed_size ** (1 / 2)), dim=3)

should be attention = torch.softmax(energy / (self.head_dim ** (1 / 2)), dim=3)

Retrieval accuracy different from official JAX/FLAX implementation

I wonder why the Retrieval accuracy is almost 20% higher than the official JAX/FLAX implementation. As the paper says, "While we achieve consistent results reported in (Tay et al. 2020)...

Training scripts needed

Dear author, your work is very excellent! I'm very interested at your training scripts to do some experiments. Please release your whole codes.