CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

How to train with instruction text

Open Ferdydh opened this issue 7 months ago • 3 comments

From what I can see, the LibriTTS example doesn't include training with instruction, and only included the tts text. Could you show how it could be done for a new dataset? Thanks

Ferdydh avatar May 17 '25 15:05 Ferdydh

I'd assume by putting: <|endofprompt|>

but would a prompt without the explicit splitting via <|endofprompt|> here work?

For example, Speechcraft's dataset In the category of Relationships and Politics, reflecting on her curiosity, a calm adult female with high pitch and low volume ponders:""What could it contain?"" Speaking at a slower pace, she ponders the possibilities.

Ferdydh avatar May 18 '25 13:05 Ferdydh

yes, if you want to train a instruct tts model, use prompt_text<|endofprompt|>tts_text in the prepared data, follow cosyvoice.inference_instruct2 data format

aluminumbox avatar May 26 '25 03:05 aluminumbox

@aluminumbox so it won't work without explicit <|endofprompt|>?

Ferdydh avatar May 26 '25 10:05 Ferdydh

could someone speak on their experience if training without the explicit separation worked well for them? that'd be great

Ferdydh avatar Jun 05 '25 21:06 Ferdydh

yes, if you want to train a instruct tts model, use prompt_text<|endofprompt|>tts_text in the prepared data, follow cosyvoice.inference_instruct2 data format

如果流式训练的话,这个prompt_text<|endofprompt|>部分不应该整体编码之后,每次都要拼接在每一个切分块之前吗,但是我看现在的代码似乎就直接当普通文本一起切了,训练和推理不是不一致了吗

jokerlj92 avatar Jun 19 '25 19:06 jokerlj92