ukemamaster
ukemamaster
Hi @WilliamLambertCN, Did you solve this issue? If yes, could you please share you solution?
I had the same problem and i solved it. In script clean.py, replace line 72 by c_wav, g_c = self.segan.generate(pwav, device=device). Do not forget to define device: device = torch.device('cuda')
I tried several times to re-cut the data into ranges from 0.5s to 20s, guaranteeing alignment with the corresponding text. But nothing improves. There might be a difference between model...
@bensonbs Have you fine tuned the xtts-v2 model on your own dataset? Can you share a histogram of the audio lengths of your dataset? Have you tried to modify the...
> I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal @rlenain Have...
> I've had success training in Spanish with ~70 hours. But I'm getting an issue where proper nouns aren't being said properly. And the pronunciation isn't always ideal Hi @rlenain...
> cosyvoice2 training recipie is not ready yet When is it expected to be released?
Any updates on this? Is the training recipe available?
@EmreOzkose Great. can you share your fine-tuning experience, and code (if possible)? Which new language you have trained for? have you got expected results? How about latency? is it real...
@EmreOzkose Thanks for your detailed answer. How is the performance in Spanish? Can you please share your model weights and inference code? Actually i tried with the pre-trained CosyVoice2-0.5B model,...