CosyVoice issues

【音色克隆静音】使用tensorrt对flow进行加速后，sft可以使用，但zero-shot合成的音频是杂音

1

使用tensorrt对flow进行加速后，sft可以使用，但zero-shot合成的音频是杂音文本：今天天气真的很好呀 [zero_shot_500.wav.zip](https://github.com/user-attachments/files/17355003/zero_shot_500.wav.zip) tensorrt推理过程 ![image](https://github.com/user-attachments/assets/3599705d-ce05-49ea-8a12-2991984b8e69) ![image](https://github.com/user-attachments/assets/ab07aaff-1928-4375-90e4-5175a49efbf9)

wang-TJ-20

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

9

问题：通过webui.py运行，推理模式选择预训练音色，点击生成音频报错，服务端显示：RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'，具体报错信息如下： 2024-09-27 16:32:01,942 INFO get sft inference request tn 我是通义实验室语音团队全新推出的生成式语音大模型，提供舒适自然的语音合成能力。 to 我是通义实验室语音团队全新推出的生成式语音大模型，提供舒适自然的语音合成能力。 0%| | 0/1 [00:00

willmyc1

【音质问题】合成音频时，音频的开始总会有滴的一声

2

合成音频时，文本分段后，每段的音频开始均会有滴的一声，听起来很奇怪，如下图，在切分后的每段文本开始前，总会有个明显的滴的一声噪音 ![image](https://github.com/user-attachments/assets/a32726de-3fa5-407b-814a-2b7f054f7852) [zero_shot_9222.wav.zip](https://github.com/user-attachments/files/17344112/zero_shot_9222.wav.zip)

wang-TJ-20

运行webui.py的一些报错

2

1、/project/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `**pip install peft**`. 这个使用pip install...

ZHUHF123

stale

How to upload/download pretrained voice-clone

3

能不能和 gpt-sovits 一样根据多段对照文本生成声音, 然后生成缓存文件, 该文件可供下载后续根据该声音特征文件, 重新合成声音

oovm

stale

Why there is no positional embeds in the flow decoder transformer layers inputs?

3

the flow model in cosyvoice, its encoder Conformers contains position embeds while in its decoder transformers, I see no such addition. is that means no benifit here in flow-matching? sorry...

JohnHerry

stale

试用zero_shot的时候卡住不动，也没报错信息

3

**Describe the bug** from cosyvoice.cli.cosyvoice import CosyVoice from cosyvoice.utils.file_utils import load_wav import torchaudio cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M') # zero_shot usage, for Chinese/English/Japanese/Cantonese/Korean prompt_speech_16k = load_wav('./33.wav', 16000) for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。',...

redpintings

stale

运行报这个错怎么找问题所在？

5

2024-09-12 09:30:52,827 - modelscope - INFO - PyTorch version 2.0.1 Found. 2024-09-12 09:30:52,828 - modelscope - INFO - Loading ast index from C:\Users\Administrator\.cache\modelscope\ast_indexer 2024-09-12 09:30:52,950 - modelscope - INFO -...

woshi66

stale

请问如何控制输出语音的情绪状态？比如开心，兴奋，难过，激动

2

是在“输入instruct文本”这里输入什么吗？还是在“输入合成文本”这里输入什么值？

HUFUPAP

stale

重新安装了conda 的python 11的环境然后目前报错是缺少torchaudio缺失

2

之前使用python 3.8环境各种报错最后安装 tts 说是至少python 3.9 以上然后重装了 python 的 3.11 环境目前报这个错 (cosyvoicep11) G:\python\CosyVoicep11>python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M Traceback (most recent call last): File "G:\python\CosyVoicep11\webui.py", line 20, in...

woshi66

stale

CosyVoice
CosyVoice copied to clipboard

Metadata

【音色克隆静音】使用tensorrt对flow进行加速后，sft可以使用，但zero-shot合成的音频是杂音

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

【音质问题】合成音频时，音频的开始总会有滴的一声

运行webui.py的一些报错

How to upload/download pretrained voice-clone

Why there is no positional embeds in the flow decoder transformer layers inputs?

试用zero_shot的时候卡住不动，也没报错信息

运行报这个错怎么找问题所在？

请问如何控制输出语音的情绪状态？比如开心，兴奋，难过，激动

重新安装了conda 的python 11的环境然后目前报错是缺少torchaudio缺失

← Metadata

Owner

Metadata

CosyVoice CosyVoice copied to clipboard

Metadata

← Metadata

Owner

Metadata

CosyVoice
CosyVoice copied to clipboard