BiLLM Interaction of padding and bidirectional mask

Interaction of padding and bidirectional mask

Open vaibhavad opened this issue 3 months ago • 0 comments

Hi,

Thanks for sharing this very interesting work. I had a question about how the bidirectional attention mask is implemented here

Based on this implementation, it seems like even the padding tokens in a batch will get unmasked, whereas they should remain masked in both unidirectional and bidirectional attention. Is my understanding correct?

Apr 10 '24 18:04 vaibhavad

BiLLM BiLLM copied to clipboard

Interaction of padding and bidirectional mask

BiLLM
BiLLM copied to clipboard