ukemamaster
ukemamaster
Hi @WilliamLambertCN, Did you solve this issue? If yes, could you please share you solution?
I had the same problem and i solved it. In script clean.py, replace line 72 by c_wav, g_c = self.segan.generate(pwav, device=device). Do not forget to define device: device = torch.device('cuda')
I tried several times to re-cut the data into ranges from 0.5s to 20s, guaranteeing alignment with the corresponding text. But nothing improves. There might be a difference between model...
@bensonbs Have you fine tuned the xtts-v2 model on your own dataset? Can you share a histogram of the audio lengths of your dataset? Have you tried to modify the...