icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Problematic insertions in the INSERTION report

Open AlexandderGorodetski opened this issue 2 years ago • 5 comments

Hello guys,

I see in the INSERTION report of telium3 recipe following:

INSERTIONS: count hyp 463 ⁇

I remember that in Kaldi UNK is deleted before calculating WER. So, I believe that ?? should not be counted as an error. What do you think?

It should be noted that once I ignore these error I indeed receive WER of 6.7% like in your results.

Thanks a lot, AlexG.

AlexandderGorodetski avatar Nov 28 '22 16:11 AlexandderGorodetski

Previous note relates to the greedy search. Same thing occurs when I check results of the beam search (i.e. deletion of ?? insertion errors brings me to the results reported in your RESULTS.md file).

AlexandderGorodetski avatar Nov 28 '22 17:11 AlexandderGorodetski

Here is the result for greedy search but I cannot find double ? in it.

https://huggingface.co/luomingshuang/icefall_asr_tedlium3_pruned_transducer_stateless/blob/main/log/greedy_search/errs-test-greedy_search-epoch-29-avg-13-context-2-max-sym-per-frame-3.txt

csukuangfj avatar Nov 28 '22 23:11 csukuangfj

@csukuangfj you are right. In your file there are no double ?.

But in my case such errors do appear. For example:

INSERTIONS: count hyp 191 ⁇ 21 and 12 to 9 of

Could you recommend please how to disable double ? printing in the output of the decoder?

AlexandderGorodetski avatar Nov 29 '22 06:11 AlexandderGorodetski

But in my case such errors do appear. For example:

Could you open the file outside of your terminal? Maybe it cannot display Unicode characters in your terminal.

csukuangfj avatar Nov 29 '22 06:11 csukuangfj

for hyp in hyp_tokens: hyps.append(sp.decode([x for x in hyp if x != 2]).split())

if 2 is the integer correspoding to <unk> in your tokens.txt

armusc avatar Dec 04 '22 21:12 armusc