CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

使用transformers==4.53.1版本,生成语音会混乱

Open yxf0314 opened this issue 5 months ago • 20 comments

使用transformers==4.53.1版本,生成语音会混乱,改用4.51.3版本则正常

以下是调用代码

    def speech(
        self,
        input: str,
        voice: Optional[str] = "Chinese Female",
        speed: float = 1,
        reponse_format: str = "mp3",
        **kwargs,
    ) -> str:
        if voice not in self._voices:
            raise ValueError(f"Voice {voice} not supported")

        original_voice = self._get_original_voice(voice)
        model_output = self._model.inference_sft( # 这里调用cosyvoice的方法
            input, original_voice, stream=False, speed=speed
        )
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=True) as temp_file:
            wav_file_path = temp_file.name
            with wave.open(wav_file_path, "wb") as wf:
                wf.setnchannels(1)  # single track
                wf.setsampwidth(2)  # 16-bit
                wf.setframerate(22050)  # Sample rate
                for i in model_output:
                    tts_audio = (
                        (i["tts_speech"].numpy() * (2**15)).astype(np.int16).tobytes()
                    )
                    wf.writeframes(tts_audio)

                output_file_path = convert(wav_file_path, reponse_format, speed)
                return output_file_path

环境: ubuntu22.04 NVIDIA-GeForce-RTX-4090 CosyVoice版本:6b21f8e

yxf0314 avatar Jul 11 '25 02:07 yxf0314

使用的是cosy2吗?俺也一样

ScottishFold007 avatar Jul 11 '25 08:07 ScottishFold007

使用的是cosy2吗?俺也一样

是的,CosyVoice2-0.5B

yxf0314 avatar Jul 11 '25 08:07 yxf0314

我也刚遇到这个情况 不知道是啥原因?都是胡说八道的声音,用的也是官方示例

ScottishFold007 avatar Jul 11 '25 08:07 ScottishFold007

这个仓库【iic/CosyVoice2-0.5B】下载的模型

ScottishFold007 avatar Jul 11 '25 08:07 ScottishFold007

生成音频.zip,我上传附件,你听听是不是这个情况

ScottishFold007 avatar Jul 11 '25 08:07 ScottishFold007

生成音频.zip,我上传附件,你听听是不是这个情况

是的是的,像喝醉酒胡言乱语那样

yxf0314 avatar Jul 11 '25 08:07 yxf0314

生成音频.zip,我上传附件,你听听是不是这个情况

是的是的,像喝醉酒胡言乱语那样

换成transformers==4.40.1,立马好

ScottishFold007 avatar Jul 11 '25 09:07 ScottishFold007

这应该是个bug,等官方修复

ScottishFold007 avatar Jul 11 '25 09:07 ScottishFold007

确实是,我为了跑vllm升级了一下就这样了

qiao131 avatar Jul 11 '25 09:07 qiao131

一直不知道什么原因,依赖一个一个的排查都解决不了,总算搞定了

BobMind758 avatar Jul 12 '25 09:07 BobMind758

无论是4.51.3还是4.40.1我都不行,还是乱读

flashzq avatar Jul 23 '25 03:07 flashzq

即便运行了requirements.txt,也还是一样的问题 conformer==0.3.2 diffusers==0.27.2 gdown==5.1.0 gradio==4.32.2 grpcio==1.57.0 grpcio-tools==1.57.0 huggingface-hub==0.23.5 hydra-core==1.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 librosa==0.10.2 lightning==2.2.4 matplotlib==3.7.5 modelscope==1.15.0 networkx==3.1 omegaconf==2.3.0 onnx==1.16.0 onnxruntime==1.18.0 openai-whisper==20231117 protobuf==4.25 pydantic==2.7.0 rich==13.7.1 soundfile==0.12.1 tensorboard==2.14.0 tensorrt-cu12==10.0.1 tensorrt-cu12-bindings==10.0.1 tensorrt-cu12-libs==10.0.1 torch==2.3.1 torchaudio==2.3.1 transformers==4.40.1 uvicorn==0.30.0 wget==3.2 fastapi==0.111.0 fastapi-cli==0.0.4 WeTextProcessing==1.0.3

flashzq avatar Jul 23 '25 03:07 flashzq

我的transformers==4.53.2也是会有这个问题,这个节点问题好多qwq,4.51.3可以

Kydon-ai avatar Jul 24 '25 08:07 Kydon-ai

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Aug 27 '25 02:08 github-actions[bot]

https://github.com/FunAudioLLM/CosyVoice/issues/1546#issuecomment-3232416350

double12gzh avatar Aug 28 '25 08:08 double12gzh

我发现分界线是v4.53.0。v4.53.0不可以,但是回退到transformers==4.52.4就正常了。这一次transformer不知道更新了啥导致的

Rhythmblue avatar Sep 08 '25 12:09 Rhythmblue

尝试了各个版本的transformers,都是产生乱读,未使用vllm,请教一下有什么办法么

shanhaidexiamo avatar Sep 09 '25 12:09 shanhaidexiamo

即便运行了requirements.txt,也还是一样的问题 conformer==0.3.2 diffusers==0.27.2 gdown==5.1.0 gradio==4.32.2 grpcio==1.57.0 grpcio-tools==1.57.0 huggingface-hub==0.23.5 hydra-core==1.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 librosa==0.10.2 lightning==2.2.4 matplotlib==3.7.5 modelscope==1.15.0 networkx==3.1 omegaconf==2.3.0 onnx==1.16.0 onnxruntime==1.18.0 openai-whisper==20231117 protobuf==4.25 pydantic==2.7.0 rich==13.7.1 soundfile==0.12.1 tensorboard==2.14.0 tensorrt-cu12==10.0.1 tensorrt-cu12-bindings==10.0.1 tensorrt-cu12-libs==10.0.1 torch==2.3.1 torchaudio==2.3.1 transformers==4.40.1 uvicorn==0.30.0 wget==3.2 fastapi==0.111.0 fastapi-cli==0.0.4 WeTextProcessing==1.0.3

请问你是怎么解决的呢

shanhaidexiamo avatar Sep 09 '25 12:09 shanhaidexiamo

这是怎么发现的,太神了

ajkpix avatar Sep 26 '25 09:09 ajkpix

我的transformers==4.53.2也是会有这个问题,这个节点问题好多qwq,4.51.3可以

我的用transformers 4.51.3也不行,一样是乱音

我启动的是cosyvoice2的模型CosyVoice2-0.5B,启动和合成没有保存,但是语音发音是乱的。 CosyVoice2(args.model_dir, load_jit=True, load_trt=True, load_vllm=True, fp16=True) 我已经按照官方的版本来安装,发现合成出来还是语音混乱的

请问你这边有语音乱音的情况吗?我的问题在这个贴:https://github.com/FunAudioLLM/CosyVoice/issues/1601

worm128 avatar Oct 12 '25 11:10 worm128