CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

使用CosyVoice2-0.5B模型webui预训练音色不显示

Open Jandown opened this issue 1 year ago • 21 comments

用其他的模型没问题,用 --model_dir pretrained_models/CosyVoice2-0.5B 预训练音色就不显示 请问这是什么问题? c

Jandown avatar Dec 17 '24 09:12 Jandown

遇到同样的问题求解决

anstonjie avatar Dec 17 '24 10:12 anstonjie

用其他的模型没问题,用 --model_dir pretrained_models/CosyVoice2-0.5B 预训练音色就不显示 请问这是什么问题? c

解决了吗?同样遇到这个问题

lukeewin avatar Dec 17 '24 10:12 lukeewin

遇到同样的问题求解决

解决了吗?同样遇到这个问题

lukeewin avatar Dec 17 '24 10:12 lukeewin

就没提供预训练音色文件,不是不显示,你可以等sft版本,或者先把1.0模型里的spk2info.pt拿来用

RHOWL3 avatar Dec 17 '24 11:12 RHOWL3

就没提供预训练音色文件,不是不显示,你可以等sft版本,或者先把1.0模型里的spk2info.pt拿来用

感谢,可以了

Jandown avatar Dec 19 '24 08:12 Jandown

https://github.com/FunAudioLLM/CosyVoice/issues/729#issuecomment-2545399338

  • #729

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

chg0901 avatar Dec 19 '24 13:12 chg0901

#729 (comment)

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.您似乎遇到了 pretrained_models\CosyVoice2-0.5B 目录中缺少 spk2info.pt 文件的问题,这会导致 webui.py 脚本报告 sft_spk 变量为空列表。

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.要解决此问题,您应该解压缩提供的 spk2info.zip 文件以获取 spk2info.pt 文件。解压后,将 spk2info.pt 文件放在 pretrained_models/CosyVoice2-0.5B 目录下。此文件对于模型至关重要,因为它包含其正常运行所需的关键扬声器信息。

感谢解答,我复制V1的spk2info.pt也是可以用的,但是有个问题,预训练音色的男声生成的都是女声,这是为什么?还有,2-0.5B模型不支持自然语言控制吗?我看有人就可以

Jandown avatar Dec 19 '24 14:12 Jandown

还有,2-0.5B模型不支持自然语言控制吗?我看有人就可以

应该是支持的,命令行都有这个功能

我觉得 webui.py 的代码还没改好

O-O1024 avatar Dec 19 '24 22:12 O-O1024

还有,2-0.5B模型不支持自然语言控制吗?我看有人就可以

应该是支持的,命令行都有这个功能

我觉得 webui.py 的代码还没改好

我试了,提示只有300-Insruct那个模型才支持。

Jandown avatar Dec 20 '24 01:12 Jandown

使用2-0.5B模型的自然语言控制,就提示 您正在使用自然语言控制模式, pretrained_models/CosyVoice2-0.5B模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型

Jandown avatar Dec 20 '24 01:12 Jandown

如何保存利用价值ptompt音乐学习到的音色为独立的模型?

Jandown @.***> 于 2024年12月20日周五 10:28写道:

使用2-0.5B模型的自然语言控制,就提示 您正在使用自然语言控制模式, pretrained_models/CosyVoice2-0.5B模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型

— Reply to this email directly, view it on GitHub https://github.com/FunAudioLLM/CosyVoice/issues/738#issuecomment-2556079282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB636WBIHT4IPTTGBCOGS6D2GNXFVAVCNFSM6AAAAABTX625S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGA3TSMRYGI . You are receiving this because you commented.Message ID: @.***>

chg0901 avatar Dec 20 '24 07:12 chg0901

使用2-0.5B模型的自然语言控制,就提示 您正在使用自然语言控制模式, pretrained_models/CosyVoice2-0.5B模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型

那个只是 webui 代码做的限制

O-O1024 avatar Dec 20 '24 10:12 O-O1024

使用2-0.5B模型的自然语言控制,就提示 您正在使用自然语言控制模式, pretrained_models/CosyVoice2-0.5B模型不支持此模式, 请使用iic/CosyVoice-300M-Instruct模型

那个只是 webui 代码做的限制

大佬有解决方案吗?

Jandown avatar Dec 20 '24 14:12 Jandown

#729 (comment)

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

It works

ptmax avatar Jan 17 '25 19:01 ptmax

#729 (comment)

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

请问大佬,这里提供的spkinfo和v1版本的计算方式一样吗?对比两个spkinfo.pt发现数值不一样?

hildazzz avatar Feb 14 '25 09:02 hildazzz

大佬有没有2.0版本能用的webui.py??自带那个限制太多了

Yhua1991 avatar Feb 20 '25 02:02 Yhua1991

大佬有没有2.0版本能用的webui.py??自带那个限制太多了 +1

mzhou1982 avatar Mar 14 '25 04:03 mzhou1982

大佬有没有2.0版本能用的webui.py??自带那个限制太多了 +1

import sys
import gradio as gr
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
import torch

cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, fp16=False)

def generate_audio(audio_path, tts_text, instruct_text):
    if not audio_path or not tts_text or not instruct_text:
        return None
        
    prompt_speech = load_wav(audio_path, 16000)
    
    # 生成音频
    results = []
    for i, j in enumerate(cosyvoice.inference_instruct2(
        tts_text, 
        instruct_text,
        prompt_speech,
        stream=False
    )):
        output_path = f"output_{i}.wav"
        torchaudio.save(output_path, j['tts_speech'], cosyvoice.sample_rate)
        results.append(output_path)
    
    if not results:
        return None
        
    # 拼接所有音频
    waveforms = []
    for path in results:
        waveform, sr = torchaudio.load(path)
        waveforms.append(waveform)
    
    concatenated = torch.cat(waveforms, dim=1)
    output_path = "output_combined.wav"
    torchaudio.save(output_path, concatenated, cosyvoice.sample_rate)
    
    return output_path

with gr.Blocks(title="CosyVoice TTS") as app:
    gr.Markdown("## CosyVoice 语音合成系统")
    
    with gr.Row():
        with gr.Column():
            ref_audio = gr.Audio(label="参考音频", type="filepath")
            tts_text = gr.Textbox(label="合成文本", placeholder="输入要合成的文本...")
            instruct_text = gr.Textbox(label="风格指令", placeholder="输入语音风格指令...")
            generate_btn = gr.Button("生成语音", variant="primary")
        
        with gr.Column():
            audio_output = gr.Audio(label="生成结果", interactive=False)

    generate_btn.click(
        fn=generate_audio,
        inputs=[ref_audio, tts_text, instruct_text],
        outputs=audio_output
    )

if __name__ == "__main__":
    app.launch(server_name="0.0.0.0", server_port=7860, share=False)

用这个可以救个急,风格指令部分结尾需要加上<|endofprompt|>,但是我发现参考音频即便选择了男声生成的也是一股女声的感觉。

anitman avatar Mar 30 '25 09:03 anitman

....邮件已收到,一会儿回复哦

Yhua1991 avatar Mar 30 '25 09:03 Yhua1991

#729 (comment)

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list.

To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

not working.The choices appear,but generate bad voice.

cskkx1 avatar May 14 '25 01:05 cskkx1

....邮件已收到,一会儿回复哦

Yhua1991 avatar May 14 '25 01:05 Yhua1991

#729 (comment)

It appears that you're encountering an issue with the absence of the spk2info.pt file in the pretrained_models\CosyVoice2-0.5B directory, which is causing the webui.py script to report that the sft_spk variable is an empty list. To resolve this, you should unzip the provided spk2info.zip file to obtain the spk2info.pt file. After extracting it, place the spk2info.pt file within the pretrained_models/CosyVoice2-0.5B directory. This file is essential for the model, as it contains critical speaker information required for its proper operation.

not working.The choices appear,but generate bad voice.

I am not surprised when you mentioned it generated bad voice --

because in my understanding, Cosy Voice 2 no longer uses speaker embedding v in the training as Cosy Voice 1. The way you used the function inference_sft() is by loading a CV2 backbone model and a MISMATCHED CV1 spk2info.pt together. It can run and generate sound, but its behavior is unpredicted.

weiwchu avatar Jul 15 '25 09:07 weiwchu

....邮件已收到,一会儿回复哦

Yhua1991 avatar Jul 15 '25 09:07 Yhua1991