Doraemonzzz comments

Repositories
Issues
Comments

Results 2 comments of


                                            Doraemonzzz

Attn Mask for Non-causal Models

> We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate...

Why the attn mask is not used in forward function?

When use forward() function, there is no direct way to use attention mask since we haven't compute attention matrix. If you need use attention mask, we suggest you use left_product,...