candle
candle copied to clipboard
fix: wrong mask for distilbert::MultiHeadSelfAttention
It seems that attention mask should be reversed first in distilbert::MultiHeadSelfAttention
https://github.com/huggingface/transformers/blob/main/src/transformers/models/distilbert/modeling_distilbert.py#L218