icefall
icefall copied to clipboard
Can't understand some output has oov?
Hi, I see my output of decode has some case like this: ground truth transcript: "mở bài hát florentino" . The word: "florentino" is the word not in vietnamese (we call it's rareword, or something like foreign word). And we have cover this word in lexicon: "florentino": "phờ lo ren ti nô" (spell in vietnamese). And output of my model (conformer + transducer stateless) decode with LG is: "mở bài hát phờ lo ren ti nô". You can see, the spell: "'phờ lo ren ti nô" has covered by lexicon, and in Language model also have this sent in training. But Decode with LG can't group spell to my ground truth. In some another case, I try to increase weighted LM, and it's can be group but some case still not. I'm not sure why it's not word as it should be. I am thinking about the silent weight or threshold or something like this need tuning for spell can group. Can give me some idea for tuning? Tks
What modeling units are you using?
What modeling units are you using?
Hi, I work with him. Do you mean what type of token we use?
What modeling units are you using?
Hi, I work with him. Do you mean what type of token we use?
Yes, BPE, phonemes, or something else?
We use phonemes. It is weird that, for example, "phờ lo ren ti nô" and "florentino" have the same phonemes in our lexicon, and our LM designs that "florentino" has higher LM score than "phờ lo ren ti nô", which is completely meaningless, but the decoding stills choose "phờ lo ren ti nô".