cosFormer icon indicating copy to clipboard operation
cosFormer copied to clipboard

Attn Mask for Non-causal Models

Open roshansh-cmu opened this issue 2 years ago • 2 comments

We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.

roshansh-cmu avatar Mar 09 '22 23:03 roshansh-cmu

We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate this ? Because the code does not explicitly compute the attention weights on which masking is traditionally applied.

Can you provide some examples before/after masking?

Doraemonzzz avatar Mar 12 '22 09:03 Doraemonzzz

for example, Swin-transformer mask Uploading image.png…

npzl avatar Sep 07 '23 06:09 npzl