CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

输入长文本时,开头读的总是断断续续,是怎么回事,

Open houliangxue opened this issue 1 month ago • 2 comments

推理代码如下

if request.stream: print("#################流式生成#################") def generate(): # 调用模型生成音频(注意:需根据语速调整,若模型支持) for chunk in cosyvoice.inference_zero_shot( request.input, # 使用OpenAI的"input"参数 prompt_text, # 提示文本(可根据voice调整音色) prompt_speech_16k, stream=True ): audio_tensor = chunk["tts_speech"] # 根据response_format转换格式(支持mp3/wav) buffer = io.BytesIO() torchaudio.save( buffer, audio_tensor, sample_rate=cosyvoice.sample_rate, format=request.response_format # 用请求指定的格式 ) buffer.seek(0) yield buffer.read()

        # 流式响应媒体类型:mp3对应audio/mpeg,wav对应audio/wav
        media_type = "audio/mpeg" if request.response_format == "mp3" else "audio/wav"
        return StreamingResponse(generate(), media_type=media_type)

houliangxue avatar Nov 05 '25 07:11 houliangxue

@houliangxue 你的stream每一次耗时多久? 我用的stream模式,使用的vllm加速,每次推理耗时1s钟,导致接收和播放断断续续

sporterman avatar Nov 06 '25 06:11 sporterman

我也出现类似问题。vllm部署,流式调用,当没有并发请求调用时,偶尔会出现断断续续的情况,当并发调用时,断断续续的频率非常高,在克隆不同的参考音色时,出现此情况的频率又有变化,我使用的是comet分支

wmj9346464543 avatar Nov 07 '25 06:11 wmj9346464543

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Dec 08 '25 02:12 github-actions[bot]