icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Can't understand some output has oov?

Open trangtv57 opened this issue 3 years ago • 4 comments

Hi, I see my output of decode has some case like this: ground truth transcript: "mở bài hát florentino" . The word: "florentino" is the word not in vietnamese (we call it's rareword, or something like foreign word). And we have cover this word in lexicon: "florentino": "phờ lo ren ti nô" (spell in vietnamese). And output of my model (conformer + transducer stateless) decode with LG is: "mở bài hát phờ lo ren ti nô". You can see, the spell: "'phờ lo ren ti nô" has covered by lexicon, and in Language model also have this sent in training. But Decode with LG can't group spell to my ground truth. In some another case, I try to increase weighted LM, and it's can be group but some case still not. I'm not sure why it's not word as it should be. I am thinking about the silent weight or threshold or something like this need tuning for spell can group. Can give me some idea for tuning? Tks

trangtv57 avatar Nov 12 '22 14:11 trangtv57

What modeling units are you using?

csukuangfj avatar Nov 13 '22 02:11 csukuangfj

What modeling units are you using?

Hi, I work with him. Do you mean what type of token we use?

ncakhoa avatar Nov 13 '22 04:11 ncakhoa

What modeling units are you using?

Hi, I work with him. Do you mean what type of token we use?

Yes, BPE, phonemes, or something else?

csukuangfj avatar Nov 13 '22 05:11 csukuangfj

We use phonemes. It is weird that, for example, "phờ lo ren ti nô" and "florentino" have the same phonemes in our lexicon, and our LM designs that "florentino" has higher LM score than "phờ lo ren ti nô", which is completely meaningless, but the decoding stills choose "phờ lo ren ti nô".

ncakhoa avatar Nov 13 '22 07:11 ncakhoa