CosyVoice 流式推理效果疑问

在cosyvoice2论文中，llm流式和非流式差距很小，但我测试test-zh时发现，流式要比非流式差。推理过程中有随机性存在，所以我推理的十次，平均后，非流式cer=1.399%，流式cer=3.227%。流式和非流式推理代码如下

def text_generator(txt):
    yield txt

if is_stream:
    input_text = text_generator(text)
else:
    input_text = text
for i, j in enumerate(cosyvoice.inference_zero_shot(input_text, prompt_text, prompt_speech_16k, stream=True)):
    audios.append(j['tts_speech'])

同时llm.py 流式推理，代码有个bug， https://github.com/FunAudioLLM/CosyVoice/blob/587604b2b433bc350c344b4b181b47249b54faf2/cosyvoice/llm/llm.py#L502，做了如下修改

# 3. final decode
if prompt_speech_token_emb.size(1) == 0:
    lm_input = torch.concat([lm_input, text_cache, task_id_emb], dim=1)
else:
    lm_input = torch.concat([lm_input, text_cache, task_id_emb, prompt_speech_token_emb], dim=1)

我的推理有什么问题吗？还是说cosyvoice2 论文中结果有问题？