cosFormer
cosFormer copied to clipboard
Attn Mask for Non-causal Models
We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.
We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.
Can you provide some examples before/after masking?
for example, Swin-transformer mask