xformers
xformers copied to clipboard
BlockDiagonalMask with t5 bias
❓ Questions and Help
Given the optimized performance from BlockDiagonalMask on variable sequence length inputs, it is a great helper to speed up the training process. However, I cannot find that BlockDiagonalMask can support with attention bias like T5 bias, which shows additional gains.
Do we already support this in somewhere?
Thanks.