ChatTTS icon indicating copy to clipboard operation
ChatTTS copied to clipboard

chat.sample_audio_speaker(wav) 功能为从音频文件提取音色吗?是否可以替代 chat.sample_random_speaker() 生成的speaker

Open flystarhe opened this issue 1 year ago • 4 comments

代码:

    filename = f"chattts-rand-speaker-{i:03d}.se.wav"

    wav, sample_rate = torchaudio.load(filename)
    wav = wav[0]

    speaker = chat.sample_audio_speaker(wav)

    params_infer_code = ChatTTS.Chat.InferCodeParams(
        # spk_emb=speaker,   # add sampled speaker
        temperature=0.3,   # using custom temperature
        top_P=0.7,         # top P decode
        top_K=20,          # top K decode
        show_tqdm=False,   # no tqdm
        manual_seed=1234,  # seed
    )

    params_refine_text = ChatTTS.Chat.RefineTextParams(
        prompt="[oral_0][laugh_0][break_6]",
        show_tqdm=False,
        manual_seed=1234,
    )

    wavs = chat.infer(
        [text],
        params_refine_text=params_refine_text,
        params_infer_code=params_infer_code,
    )

报错:

File /opt/conda/envs/dev-chattts/lib/python3.11/site-packages/ChatTTS/core.py:220, in Chat.infer(self, text, stream, lang, skip_refine_text, refine_text_only, use_decoder, do_text_normalization, do_homophone_replacement, params_refine_text, params_infer_code)
    218     return res_gen
    219 else:
--> 220     return next(res_gen)

StopIteration: 

flystarhe avatar Oct 12 '24 08:10 flystarhe

不可以,两个用法不同,编码也不同。chat.sample_random_speaker()生成的是音色信息,其长度永远不变。chat.sample_audio_speaker(wav)生成的则是音频的token编码,其长度与音频长度正相关。

fumiama avatar Oct 15 '24 15:10 fumiama

不可以,两个用法不同,编码也不同。chat.sample_random_speaker()生成的是音色信息,其长度永远不变。chat.sample_audio_speaker(wav)生成的则是音频的token编码,其长度与音频长度正相关。

大佬,请教一下,有办法低成本总chattts克隆音色么?之前用gpt-sovits,稳定性很奇怪。

zeushera140 avatar Nov 30 '24 08:11 zeushera140

已经有人写了微调,你可以试试https://github.com/warmshao/ChatTTSPlus

fumiama avatar Dec 05 '24 05:12 fumiama

已经有人写了微调,你可以试试https://github.com/warmshao/ChatTTSPlus

谢谢,我看看。

zeushera140 avatar Dec 06 '24 11:12 zeushera140