Doraemonzzz
Results
2
comments of
Doraemonzzz
> We are examining non-NLP applications of the cosformer self-attention, and would need to use attention masking for the padded tokens in the batch. Is there a way to incorporate...
When use forward() function, there is no direct way to use attention mask since we haven't compute attention matrix. If you need use attention mask, we suggest you use left_product,...