Wav2Vec2_PyCTCDecode icon indicating copy to clipboard operation
Wav2Vec2_PyCTCDecode copied to clipboard

Word Level or Char Level language model?

Open MagedSaeed opened this issue 3 years ago • 1 comments

Thanks @patrickvonplaten for this repo, it really helped a lot!

Just a question here, what is the best language model for CTC decoding? is it a character-level or word-level language model? I am assuming a character level should be the choice as wav2vec decodes characters. However, it seems that the practice is to use a word-level one. I notice that in many repos and posts. Please correct me if I am wrong. Also, if so, can you please elaborate on why word-level language models are preferred over char-level ones?

MagedSaeed avatar Dec 18 '21 00:12 MagedSaeed

Did you find out? I'm facing with the same question

Oneliness avatar Apr 29 '22 07:04 Oneliness