cosFormer
cosFormer copied to clipboard
Why the attn mask is not used in forward function?
Compared with left_product
function, attention mask is not used in forward()
function.
How to use the attention mask in the forward method?
When use forward() function, there is no direct way to use attention mask since we haven't compute attention matrix. If you need use attention mask, we suggest you use left_product, however, this will get loss in efficiency.