Other languages
Thanks for your great work! I'm just wondering how big dataset is recommended from training from scratch for other languages?
Thank you!
I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal
of course you can, check whisper tokenizer and add <|your language|> at sentence start
@aluminumbox i'm getting a weird issue in spanish where proper nouns / uncommon words aren't being said properly - think it might be a tokenizer issue. do you have any idea how the BPE tokenizer would react to a new language and a reason why it would struggle with proper nouns / uncommon words?
@aluminumbox i'm getting a weird issue in spanish where proper nouns / uncommon words aren't being said properly - think it might be a tokenizer issue. do you have any idea how the BPE tokenizer would react to a new language and a reason why it would struggle with proper nouns / uncommon words?
we use whisper tokenizer, check cosyvoice.yaml, we also do not have enough experience in spanish tokenization
hello @rlenain, are you training only llm model or also flow model? and how much GPU resources you use for Spanish training.
hi @aluminumbox , do you think it's better to train cosyvoice from scratch or just finetune the CosyVoice-300M base model if I want to train on new language? Also, should I train both llm and flow if I want to finetune it?
I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal
@rlenain Have you solved this issue? Which models did you train? Could you please share your code?
if anyone know of a guide to finetune / train our own models based in other languages please share a link. My idea is to evaluate to check how this project compares to other TTS solutions out there, then stick with one instead of having using different solutions.
I see we have a lot of languages here that in theory we could train.
I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal
Hi @rlenain , What is your data format? and which code did you follow to add spanish language?
Hi @rlenain , trying to finetune also, could you share your steps?