CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

生成的音频是胡言乱语

Open WangGewu opened this issue 5 months ago • 11 comments

下载最新代码,下载最新模型:https://www.modelscope.cn/models/iic/CosyVoice2-0.5B/files,运行示例合成代码,合成结果是胡言乱语。

WangGewu avatar Jul 24 '25 12:07 WangGewu

你把transformers的版本降降,按它要求的来。

Yanceye avatar Jul 24 '25 12:07 Yanceye

楼上有道理,按原始的transformers效果好一点,但是不能完全杜绝。多次测试,发现和随机数种子有巨大相关性。

WLimin avatar Jul 24 '25 13:07 WLimin

降低版本后,解决了

WangGewu avatar Jul 25 '25 09:07 WangGewu

如果使用vllm的话,transformers应该用哪个版本呢?

WangGewu avatar Jul 26 '25 09:07 WangGewu

如果使用vllm的话,transformers应该用哪个版本呢?

兄弟,这个解决了吗?求告知

geekgogo avatar Jul 28 '25 10:07 geekgogo

transformers有比较激进的版本要求,高了低了都可能导致生成莫名其妙的音频,也有可能生成长时间空音频,建议按照官方版本完全一致

Uncolor-Duck avatar Jul 30 '25 07:07 Uncolor-Duck

我运行了vllm_example.py,完全不知道它干什么。 下面是我的版本列表。我的实验表明,刚运行初次加载模型,生成的语音是否正常和随机数种子相关,但设置随机数种子后再生成基本正常,包括更改种子数。但是,用内置的“中文女”像是女音,“中文男”还像女音…… ` (cosyvoice) webui@1324eb3a3bc4:/workspace/CosyVoice$ pip list Package Version


absl-py 2.3.1 aiofiles 23.2.1 aiohappyeyeballs 2.6.1 aiohttp 3.12.14 aiosignal 1.4.0 aliyun-python-sdk-core 2.16.0 aliyun-python-sdk-kms 2.16.5 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.9.0 archspec 0.2.5 astor 0.8.1 asttokens 3.0.0 astunparse 1.6.3 attrs 25.3.0 audioread 3.0.1 beautifulsoup4 4.13.4 blake3 1.0.5 blinker 1.9.0 boltons 24.0.0 Brotli 1.1.0 cachetools 6.1.0 cbor2 5.6.5 certifi 2025.4.26 cffi 1.17.1 chardet 5.2.0 charset-normalizer 3.4.2 click 8.2.1 cloudpickle 3.1.1 cmake 4.0.2 colorama 0.4.6 coloredlogs 15.0.1 compressed-tensors 0.10.2 conda 25.5.0 conda-build 25.5.0 conda_index 0.6.1 conda-libmamba-solver 25.3.0 conda-package-handling 2.4.0 conda_package_streaming 0.11.0 conformer 0.3.2 contourpy 1.3.3 crcmod 1.7 cryptography 45.0.5 cupy-cuda12x 13.5.1 cycler 0.12.1 Cython 3.1.2 decorator 5.2.1 deepspeed 0.15.1 depyf 0.19.0 diffusers 0.34.0 dill 0.4.0 diskcache 5.6.3 distro 1.9.0 dnspython 2.7.0 editdistance 0.8.1 einops 0.8.1 email_validator 2.2.0 evalidate 2.0.5 exceptiongroup 1.3.0 executing 2.2.0 expecttest 0.3.0 fastapi 0.115.6 fastapi-cli 0.0.8 fastapi-cloud-cli 0.1.5 fastrlock 0.8.3 ffmpy 0.6.1 filelock 3.18.0 Flask 3.1.1 flask-cors 6.0.1 flatbuffers 25.2.10 fonttools 4.59.0 frozendict 2.4.6 frozenlist 1.7.0 fsspec 2025.5.1 funasr 1.2.6 gdown 5.1.0 gguf 0.17.1 gradio 5.4.0 gradio_client 1.4.2 grpcio 1.57.0 grpcio-tools 1.57.0 h11 0.16.0 h2 4.2.0 hf-xet 1.1.5 hjson 3.1.0 hpack 4.1.0 httpcore 1.0.9 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.34.1 humanfriendly 10.0 hydra-core 1.3.2 hyperframe 6.1.0 HyperPyYAML 1.2.2 hypothesis 6.135.0 idna 3.10 importlib_metadata 8.7.0 importlib_resources 6.5.2 inflect 7.3.1 interegular 0.3.3 ipython 9.3.0 ipython_pygments_lexers 1.1.1 itsdangerous 2.2.0 jaconv 0.4.0 jamo 0.4.1 jedi 0.19.2 jieba 0.42.1 Jinja2 3.1.6 jiter 0.10.0 jmespath 0.10.0 joblib 1.5.1 jsonpatch 1.33 jsonpointer 3.0.0 jsonschema 4.24.0 jsonschema-specifications 2025.4.1 kaldifst 1.7.14 kaldiio 2.18.1 kiwisolver 1.4.8 lark 1.2.2 lazy_loader 0.4 libarchive-c 5.3 libmambapy 2.1.1 librosa 0.10.2 lief 0.16.4 lightning 2.5.2 lightning-utilities 0.15.0 lintrunner 0.12.7 llguidance 0.7.30 llvmlite 0.44.0 lm-format-enforcer 0.10.11 Markdown 3.8.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.7.5 matplotlib-inline 0.1.7 mdurl 0.1.2 menuinst 2.2.0 mistral_common 1.8.3 modelscope 1.20.0 more-itertools 10.7.0 mpmath 1.3.0 msgpack 1.1.0 msgspec 0.19.0 multidict 6.6.3 networkx 3.5 ninja 1.11.1.4 numba 0.61.2 numpy 1.26.4 nvidia-cublas-cu12 12.8.3.14 nvidia-cuda-cupti-cu12 12.8.57 nvidia-cuda-nvrtc-cu12 12.8.61 nvidia-cuda-runtime-cu12 12.8.57 nvidia-cudnn-cu12 9.7.1.26 nvidia-cufft-cu12 11.3.3.41 nvidia-cufile-cu12 1.13.0.11 nvidia-curand-cu12 10.3.9.55 nvidia-cusolver-cu12 11.7.2.55 nvidia-cusparse-cu12 12.5.7.53 nvidia-cusparselt-cu12 0.6.3 nvidia-nccl-cu12 2.26.2 nvidia-nvjitlink-cu12 12.8.61 nvidia-nvtx-cu12 12.8.55 omegaconf 2.3.0 onnx 1.17.0 onnxruntime 1.22.1 onnxruntime-gpu 1.22.0 openai 1.90.0 openai-whisper 20250625 opencv-python-headless 4.11.0.86 optree 0.16.0 orjson 3.11.1 oss2 2.19.1 outlines_core 0.2.10 packaging 25.0 pandas 2.3.1 parso 0.8.4 partial-json-parser 0.2.1.1.post6 pexpect 4.9.0 pickleshare 0.7.5 pillow 11.0.0 pip 24.0 pkginfo 1.12.1.2 pkgutil_resolve_name 1.3.10 platformdirs 4.3.8 pluggy 1.5.0 pooch 1.8.2 prometheus_client 0.22.1 prometheus-fastapi-instrumentator 7.1.0 prompt_toolkit 3.0.51 propcache 0.3.2 protobuf 4.25.0 psutil 7.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 21.0.0 pybase64 1.4.2 pycosat 0.6.6 pycountry 24.6.1 pycparser 2.22 pycryptodome 3.23.0 pydantic 2.10.6 pydantic_core 2.27.2 pydantic-extra-types 2.10.5 pydub 0.25.1 Pygments 2.19.1 pynndescent 0.5.13 pyparsing 3.2.3 PySocks 1.7.1 python-dateutil 2.9.0.post0 python-dotenv 1.1.1 python-etcd 0.4.5 python-json-logger 3.3.0 python-multipart 0.0.12 pytorch-lightning 2.5.2 pytorch-wpe 0.0.1 pytz 2025.2 pyworld 0.3.4 PyYAML 6.0.2 pyzmq 27.0.0 ray 2.48.0 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rich 13.7.1 rich-toolkit 0.14.9 rignore 0.6.4 rpds-py 0.25.1 ruamel.yaml 0.18.12 ruamel.yaml.clib 0.2.8 ruff 0.12.5 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.1 scipy 1.16.1 semantic-version 2.10.0 sentencepiece 0.2.0 sentry-sdk 2.34.1 setuptools 65.5.0 shellingham 1.5.4 six 1.17.0 sniffio 1.3.1 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.7 soxr 0.5.0.post1 stack_data 0.6.3 starlette 0.41.3 sympy 1.14.0 tensorboard 2.20.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.4 tensorrt-cu12 10.13.0.35 tensorrt_cu12_bindings 10.13.0.35 tensorrt_cu12_libs 10.13.0.35 threadpoolctl 3.6.0 tiktoken 0.9.0 tokenizers 0.21.4 tomlkit 0.12.0 torch 2.7.1+cu128 torch-complex 0.4.4 torchaudio 2.7.1+cu128 torchelastic 0.2.2 torchmetrics 1.8.0 torchvision 0.22.1+cu128 tqdm 4.67.1 traitlets 5.14.3 transformers 4.54.1 triton 3.3.1 truststore 0.10.1 typeguard 4.4.4 typer 0.16.0 types-dataclasses 0.6.6 typing_extensions 4.14.0 tzdata 2025.2 umap-learn 0.5.9.post2 urllib3 2.4.0 uvicorn 0.30.0 uvloop 0.21.0 vllm 0.10.0 watchfiles 1.1.0 wcwidth 0.2.13 websockets 12.0 Werkzeug 3.1.3 wetext 0.0.8 wget 3.2 wheel 0.45.1 xformers 0.0.31 xgrammar 0.1.21 yarl 1.20.1 zipp 3.22.0 zstandard 0.23.0

`

WLimin avatar Jul 31 '25 01:07 WLimin

我运行了vllm_example.py,完全不知道它干什么。 下面是我的版本列表。我的实验表明,刚运行初次加载模型,生成的语音是否正常和随机数种子相关,但设置随机数种子后再生成基本正常,包括更改种子数。但是,用内置的“中文女”像是女音,“中文男”还像女音…… ` (cosyvoice) webui@1324eb3a3bc4:/workspace/CosyVoice$ pip list Package Version

absl-py 2.3.1 aiofiles 23.2.1 aiohappyeyeballs 2.6.1 aiohttp 3.12.14 aiosignal 1.4.0 aliyun-python-sdk-core 2.16.0 aliyun-python-sdk-kms 2.16.5 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.9.0 archspec 0.2.5 astor 0.8.1 asttokens 3.0.0 astunparse 1.6.3 attrs 25.3.0 audioread 3.0.1 beautifulsoup4 4.13.4 blake3 1.0.5 blinker 1.9.0 boltons 24.0.0 Brotli 1.1.0 cachetools 6.1.0 cbor2 5.6.5 certifi 2025.4.26 cffi 1.17.1 chardet 5.2.0 charset-normalizer 3.4.2 click 8.2.1 cloudpickle 3.1.1 cmake 4.0.2 colorama 0.4.6 coloredlogs 15.0.1 compressed-tensors 0.10.2 conda 25.5.0 conda-build 25.5.0 conda_index 0.6.1 conda-libmamba-solver 25.3.0 conda-package-handling 2.4.0 conda_package_streaming 0.11.0 conformer 0.3.2 contourpy 1.3.3 crcmod 1.7 cryptography 45.0.5 cupy-cuda12x 13.5.1 cycler 0.12.1 Cython 3.1.2 decorator 5.2.1 deepspeed 0.15.1 depyf 0.19.0 diffusers 0.34.0 dill 0.4.0 diskcache 5.6.3 distro 1.9.0 dnspython 2.7.0 editdistance 0.8.1 einops 0.8.1 email_validator 2.2.0 evalidate 2.0.5 exceptiongroup 1.3.0 executing 2.2.0 expecttest 0.3.0 fastapi 0.115.6 fastapi-cli 0.0.8 fastapi-cloud-cli 0.1.5 fastrlock 0.8.3 ffmpy 0.6.1 filelock 3.18.0 Flask 3.1.1 flask-cors 6.0.1 flatbuffers 25.2.10 fonttools 4.59.0 frozendict 2.4.6 frozenlist 1.7.0 fsspec 2025.5.1 funasr 1.2.6 gdown 5.1.0 gguf 0.17.1 gradio 5.4.0 gradio_client 1.4.2 grpcio 1.57.0 grpcio-tools 1.57.0 h11 0.16.0 h2 4.2.0 hf-xet 1.1.5 hjson 3.1.0 hpack 4.1.0 httpcore 1.0.9 httptools 0.6.4 httpx 0.28.1 huggingface-hub 0.34.1 humanfriendly 10.0 hydra-core 1.3.2 hyperframe 6.1.0 HyperPyYAML 1.2.2 hypothesis 6.135.0 idna 3.10 importlib_metadata 8.7.0 importlib_resources 6.5.2 inflect 7.3.1 interegular 0.3.3 ipython 9.3.0 ipython_pygments_lexers 1.1.1 itsdangerous 2.2.0 jaconv 0.4.0 jamo 0.4.1 jedi 0.19.2 jieba 0.42.1 Jinja2 3.1.6 jiter 0.10.0 jmespath 0.10.0 joblib 1.5.1 jsonpatch 1.33 jsonpointer 3.0.0 jsonschema 4.24.0 jsonschema-specifications 2025.4.1 kaldifst 1.7.14 kaldiio 2.18.1 kiwisolver 1.4.8 lark 1.2.2 lazy_loader 0.4 libarchive-c 5.3 libmambapy 2.1.1 librosa 0.10.2 lief 0.16.4 lightning 2.5.2 lightning-utilities 0.15.0 lintrunner 0.12.7 llguidance 0.7.30 llvmlite 0.44.0 lm-format-enforcer 0.10.11 Markdown 3.8.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.7.5 matplotlib-inline 0.1.7 mdurl 0.1.2 menuinst 2.2.0 mistral_common 1.8.3 modelscope 1.20.0 more-itertools 10.7.0 mpmath 1.3.0 msgpack 1.1.0 msgspec 0.19.0 multidict 6.6.3 networkx 3.5 ninja 1.11.1.4 numba 0.61.2 numpy 1.26.4 nvidia-cublas-cu12 12.8.3.14 nvidia-cuda-cupti-cu12 12.8.57 nvidia-cuda-nvrtc-cu12 12.8.61 nvidia-cuda-runtime-cu12 12.8.57 nvidia-cudnn-cu12 9.7.1.26 nvidia-cufft-cu12 11.3.3.41 nvidia-cufile-cu12 1.13.0.11 nvidia-curand-cu12 10.3.9.55 nvidia-cusolver-cu12 11.7.2.55 nvidia-cusparse-cu12 12.5.7.53 nvidia-cusparselt-cu12 0.6.3 nvidia-nccl-cu12 2.26.2 nvidia-nvjitlink-cu12 12.8.61 nvidia-nvtx-cu12 12.8.55 omegaconf 2.3.0 onnx 1.17.0 onnxruntime 1.22.1 onnxruntime-gpu 1.22.0 openai 1.90.0 openai-whisper 20250625 opencv-python-headless 4.11.0.86 optree 0.16.0 orjson 3.11.1 oss2 2.19.1 outlines_core 0.2.10 packaging 25.0 pandas 2.3.1 parso 0.8.4 partial-json-parser 0.2.1.1.post6 pexpect 4.9.0 pickleshare 0.7.5 pillow 11.0.0 pip 24.0 pkginfo 1.12.1.2 pkgutil_resolve_name 1.3.10 platformdirs 4.3.8 pluggy 1.5.0 pooch 1.8.2 prometheus_client 0.22.1 prometheus-fastapi-instrumentator 7.1.0 prompt_toolkit 3.0.51 propcache 0.3.2 protobuf 4.25.0 psutil 7.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 21.0.0 pybase64 1.4.2 pycosat 0.6.6 pycountry 24.6.1 pycparser 2.22 pycryptodome 3.23.0 pydantic 2.10.6 pydantic_core 2.27.2 pydantic-extra-types 2.10.5 pydub 0.25.1 Pygments 2.19.1 pynndescent 0.5.13 pyparsing 3.2.3 PySocks 1.7.1 python-dateutil 2.9.0.post0 python-dotenv 1.1.1 python-etcd 0.4.5 python-json-logger 3.3.0 python-multipart 0.0.12 pytorch-lightning 2.5.2 pytorch-wpe 0.0.1 pytz 2025.2 pyworld 0.3.4 PyYAML 6.0.2 pyzmq 27.0.0 ray 2.48.0 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rich 13.7.1 rich-toolkit 0.14.9 rignore 0.6.4 rpds-py 0.25.1 ruamel.yaml 0.18.12 ruamel.yaml.clib 0.2.8 ruff 0.12.5 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.1 scipy 1.16.1 semantic-version 2.10.0 sentencepiece 0.2.0 sentry-sdk 2.34.1 setuptools 65.5.0 shellingham 1.5.4 six 1.17.0 sniffio 1.3.1 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.7 soxr 0.5.0.post1 stack_data 0.6.3 starlette 0.41.3 sympy 1.14.0 tensorboard 2.20.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.4 tensorrt-cu12 10.13.0.35 tensorrt_cu12_bindings 10.13.0.35 tensorrt_cu12_libs 10.13.0.35 threadpoolctl 3.6.0 tiktoken 0.9.0 tokenizers 0.21.4 tomlkit 0.12.0 torch 2.7.1+cu128 torch-complex 0.4.4 torchaudio 2.7.1+cu128 torchelastic 0.2.2 torchmetrics 1.8.0 torchvision 0.22.1+cu128 tqdm 4.67.1 traitlets 5.14.3 transformers 4.54.1 triton 3.3.1 truststore 0.10.1 typeguard 4.4.4 typer 0.16.0 types-dataclasses 0.6.6 typing_extensions 4.14.0 tzdata 2025.2 umap-learn 0.5.9.post2 urllib3 2.4.0 uvicorn 0.30.0 uvloop 0.21.0 vllm 0.10.0 watchfiles 1.1.0 wcwidth 0.2.13 websockets 12.0 Werkzeug 3.1.3 wetext 0.0.8 wget 3.2 wheel 0.45.1 xformers 0.0.31 xgrammar 0.1.21 yarl 1.20.1 zipp 3.22.0 zstandard 0.23.0

`

请尝试transformers==4.40.1

Uncolor-Duck avatar Jul 31 '25 02:07 Uncolor-Duck

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Aug 31 '25 02:08 github-actions[bot]

我启动的是cosyvoice2的模型CosyVoice2-0.5B,启动和合成没有保存,但是语音发音是乱的。 版本:transformers 4.51.3 vllm 0.9.0,按照官方版本依然是乱音 CosyVoice2(args.model_dir, load_jit=True, load_trt=True, load_vllm=True, fp16=True)

请问你这边有语音乱音的情况吗?我的问题在这个贴:#1601

worm128 avatar Oct 12 '25 11:10 worm128

Thank you ! I have received your e-mail.Best regards!

Yanceye avatar Oct 12 '25 11:10 Yanceye