Christian Schäfer
Christian Schäfer
Hi, as for robotic - do you mean the prosody or voice quality (e.g. metallic sound)? Do you have an example? Undertrained melgan usually has some hissing metallic sound that...
Sounds not bad imo, although I do not understanda single word :-). Is that a sigle sentence? Generally I feel that the model performs best if applied sentence by sentence....
Ah cool. I highly recommend to tweak the melgan using larger receptive fields, i.e. more layers for the resnet. I got a pretty good quality boost using 4-7 layers (successively...
Hi, did you ensure that all the audio files were preprocessed before training? Because the preprocessing builds up a phoneme sett from the training data. I'd suspect that you apply...
Hi, thanks for the hint. I will update this if I have time :)
Hi, just to let you know I am currently working on a multispeaker implementation that will be live soon. Fine-tuning is possible with about 5mins of fresh data.
Hi, yeah I am currently implementing it in the below branch: https://github.com/as-ideas/ForwardTacotron/tree/feature/multispeaker Its probably going to be ready in 2 weeks or so. I am currently testing it on the...
Hi, yeah I gonna implement both (ForwardTaco first, then FastPitch) - in my experience ForwardTaco is actually performing better, but it may depend on the dataset...
Hi multispeaker is merged and ready for testing. I tested it on a custom dataset but as always with such large merges, there may be bugs - pls let me...
Hi, this is strange, could you check your version of the phonemizer package? Also you could run the cleaner test to see what the problem might be (probably need to...