How to train with instruction text
From what I can see, the LibriTTS example doesn't include training with instruction, and only included the tts text. Could you show how it could be done for a new dataset? Thanks
I'd assume by putting:
but would a prompt without the explicit splitting via <|endofprompt|> here work?
For example, Speechcraft's dataset In the category of Relationships and Politics, reflecting on her curiosity, a calm adult female with high pitch and low volume ponders:""What could it contain?"" Speaking at a slower pace, she ponders the possibilities.
yes, if you want to train a instruct tts model, use prompt_text<|endofprompt|>tts_text in the prepared data, follow cosyvoice.inference_instruct2 data format
@aluminumbox so it won't work without explicit <|endofprompt|>?
could someone speak on their experience if training without the explicit separation worked well for them? that'd be great
yes, if you want to train a instruct tts model, use prompt_text<|endofprompt|>tts_text in the prepared data, follow cosyvoice.inference_instruct2 data format
如果流式训练的话,这个prompt_text<|endofprompt|>部分不应该整体编码之后,每次都要拼接在每一个切分块之前吗,但是我看现在的代码似乎就直接当普通文本一起切了,训练和推理不是不一致了吗