xformers
xformers copied to clipboard
Masked MLP
🚀 Feature
A MLP implementation that efficiently avoid computing for masked tokens.
Motivation
The motivation is that if padding is quite unequal too much MLP computation is wasted in computing the padding tokens output with the MLP
Pitch
Ideally, this MLP would avoid doing the computation for masked tokens and only leveraging compute for the useful tokens
Alternatives
The usual alternative is packing, it works well for certain usecase, but they are still scenarios where packing is not possible. Also packing is quite cumbersome to manage. It's also less optimal as some padding will always remain