icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Streaming Zipformer2. extra words at the end.

Open joazoa opened this issue 1 year ago • 6 comments

Hello,

I tried training a streaming zipformer2, but I end up with small common words (like "and" or "it" or letters randomly being added to the output at the end of the string, as if a language model predicted them. It is happening to a large percentage of the lines. (the large WER is largely due to these words, aside from that errors are minimal). (im decoding with chunksize 32, -128 left context, modified beamsearch )

I don't have this problem with the non streaming version, no extra words predicted there. Does anybody have a suggestion of what i could try to solve this ?

joazoa avatar Dec 06 '23 12:12 joazoa

Which decoding script did you use, decode.py or streaming_decode.py? I think the issue might be caused by the tail padding. You could try to reduce the tail padding length:

https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/decode.py#L439

https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/streaming_decode.py#L586

However, you should pay attention to the tail deletion problem when you reduce the tail padding length.

yaozengwei avatar Dec 06 '23 13:12 yaozengwei

Thank you, I just tried. Reducing the value does not seem to help, (I tried evem with 0 and -30). However, I discovered that the greedy search does not have the same problem as the modified beam search. Almost no extra words even with tail_pad_len=30). The fast beam search also has the extra words.

joazoa avatar Dec 06 '23 14:12 joazoa

@joazoa So the issue is happening for both decode.py and streaming_decode.py?

marcoyang1998 avatar Dec 07 '23 01:12 marcoyang1998

I have only tried the streaming decode, i will try the decode.py with simulation next time i get server time.

joazoa avatar Dec 07 '23 12:12 joazoa

You could also try setting length_norm to False for modified_beam_search when using streaming_decode.py, (just for debugging)

https://github.com/k2-fsa/icefall/blob/b87ed26c09e9f5bb29174dd01f13670fb6124583/egs/librispeech/ASR/zipformer/decode_stream.py#L144

yaozengwei avatar Dec 07 '23 13:12 yaozengwei

I just tried it, decode.py does not experience the issue. (WER goes from 8.9 to 6.37) with the same settings. Training more seems to reduce the amount of extra words as well. (but they are still there) The length-norm set to false does not seem to help.

Thank you for your help.

joazoa avatar Dec 07 '23 17:12 joazoa