icefall
icefall copied to clipboard
Problematic insertions in the INSERTION report
Hello guys,
I see in the INSERTION report of telium3 recipe following:
INSERTIONS: count hyp 463 ⁇
I remember that in Kaldi UNK is deleted before calculating WER. So, I believe that ?? should not be counted as an error. What do you think?
It should be noted that once I ignore these error I indeed receive WER of 6.7% like in your results.
Thanks a lot, AlexG.
Previous note relates to the greedy search. Same thing occurs when I check results of the beam search (i.e. deletion of ?? insertion errors brings me to the results reported in your RESULTS.md file).
Here is the result for greedy search but I cannot find double ? in it.
https://huggingface.co/luomingshuang/icefall_asr_tedlium3_pruned_transducer_stateless/blob/main/log/greedy_search/errs-test-greedy_search-epoch-29-avg-13-context-2-max-sym-per-frame-3.txt
@csukuangfj you are right. In your file there are no double ?.
But in my case such errors do appear. For example:
INSERTIONS: count hyp 191 ⁇ 21 and 12 to 9 of
Could you recommend please how to disable double ? printing in the output of the decoder?
But in my case such errors do appear. For example:
Could you open the file outside of your terminal? Maybe it cannot display Unicode characters in your terminal.
for hyp in hyp_tokens: hyps.append(sp.decode([x for x in hyp if x != 2]).split())
if 2 is the integer correspoding to <unk> in your tokens.txt