icefall
icefall copied to clipboard
Streaming Zipformer2. extra words at the end.
Hello,
I tried training a streaming zipformer2, but I end up with small common words (like "and" or "it" or letters randomly being added to the output at the end of the string, as if a language model predicted them. It is happening to a large percentage of the lines. (the large WER is largely due to these words, aside from that errors are minimal). (im decoding with chunksize 32, -128 left context, modified beamsearch )
I don't have this problem with the non streaming version, no extra words predicted there. Does anybody have a suggestion of what i could try to solve this ?
Which decoding script did you use, decode.py
or streaming_decode.py
? I think the issue might be caused by the tail padding. You could try to reduce the tail padding length:
https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/decode.py#L439
https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/streaming_decode.py#L586
However, you should pay attention to the tail deletion problem when you reduce the tail padding length.
Thank you, I just tried. Reducing the value does not seem to help, (I tried evem with 0 and -30). However, I discovered that the greedy search does not have the same problem as the modified beam search. Almost no extra words even with tail_pad_len=30). The fast beam search also has the extra words.
@joazoa So the issue is happening for both decode.py
and streaming_decode.py
?
I have only tried the streaming decode, i will try the decode.py with simulation next time i get server time.
You could also try setting length_norm
to False
for modified_beam_search
when using streaming_decode.py
, (just for debugging)
https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/decode_stream.py#L144
I just tried it, decode.py does not experience the issue. (WER goes from 8.9 to 6.37) with the same settings. Training more seems to reduce the amount of extra words as well. (but they are still there) The length-norm set to false does not seem to help.
Thank you for your help.