cosFormer icon indicating copy to clipboard operation
cosFormer copied to clipboard

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

Results 5 cosFormer issues
Sort by recently updated
recently updated
newest added

We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this...

When implementing cosformer on MultiHeadAttention in Transformer-XL and running without extra long-range memory, the ReLU performance is worse than eLU. I think it is because the Attention and FF Net...

original code makes int type + function type

In the paper,it mentioned that the work of the bidirectional language modeling pre-train has been done. Are you planning on releasing some pre-trained weights for the model?

Compared with `left_product` function, attention mask is not used in `forward()` function. How to use the attention mask in the forward method?