BiLLM icon indicating copy to clipboard operation
BiLLM copied to clipboard

Interaction of padding and bidirectional mask

Open vaibhavad opened this issue 3 months ago • 0 comments

Hi,

Thanks for sharing this very interesting work. I had a question about how the bidirectional attention mask is implemented here

Based on this implementation, it seems like even the padding tokens in a batch will get unmasked, whereas they should remain masked in both unidirectional and bidirectional attention. Is my understanding correct?

vaibhavad avatar Apr 10 '24 18:04 vaibhavad