Seungju
Seungju
I've also tried with our dataset and it shows that it was able to generalize on unseen speakers. An interesting part was even I trained the vocoder with Korean dataset,...
Most of samples have similar quality as samples from 6400 epochs, however I figured out that vocoder was vulnerable at background noise(such as clapping sound)
Thanks for your reply! I guessed that strange artifacts like below happens because of those hyperparameters. Don't you have seen those artifacts? I got those artifacts mainly on the front...
Well, I was training new model from scratch using Korean speech data corpus. It has 300 hours amount of various speakers' utterances, and I was getting those artifacts after I...
@seungwonpark Sorry but I couldn't find the note in the original paper that batch size was carefully chosen. Also, I've thinking that if we use multi-speaker training scheme and use...
Is it obvious that mel-gan works best at batch size 16? I reminded the mention of authors and now it sounds like they realize there are trade-offs between audio fidelity...
I also experienced the pronunciation problem. My case was worse since the pronunciation significantly get degraded even for long inputs. Have you solve this?
No, I didn't encounter that error. Can you give me more context?