fairseq Why wav2vec2-base-960h is trained without using attention mask?

Why wav2vec2-base-960h is trained without using attention mask?

Open YWMditto opened this issue 2 years ago • 0 comments

I have seen the code of Wav2Vec2FeatureExtractor in transformers, and it said the model wav2vec2-base-960h is trained without using attention mask.

I wonder why and how the model is trained without using attention mask to mask the pad place.

Does not it make mistakes when allowing the pad place into compute?

Oct 25 '22 05:10 YWMditto