pyctcdecode icon indicating copy to clipboard operation
pyctcdecode copied to clipboard

OOV words with small LM

Open davidavdav opened this issue 2 years ago • 0 comments

Hello,

We have an application with a very small vocabulary (~100 words). With an almost trivial bigram model (as kenlm seems not to be able to make a unigram model), we see that decoder.decode() produces words that are not in the language model.

Is there some kind of fallback to letter decoding? Is there a way to turn this off?

Thanks!

davidavdav avatar Nov 06 '23 13:11 davidavdav