yoyodyne
yoyodyne copied to clipboard
Disables `norm_first` in the transformer encoder
This allows us to use the experimental, but supposedly faster, nested tensor API:
https://pytorch.org/docs/stable/nested.html
As the documentation indicates, this apparently is particularly helpful with padding.
Closes #214. Mutually exclusive with #225.