Seungju comments

Results 8 comments of


                                            Seungju

Use mel-gan as an universal vocoder

I've also tried with our dataset and it shows that it was able to generalize on unseen speakers. An interesting part was even I trained the vocoder with Korean dataset,...

Use mel-gan as an universal vocoder

Most of samples have similar quality as samples from 6400 epochs, however I figured out that vocoder was vulnerable at background noise(such as clapping sound)

Usage of audio_slice_frames, sample_frames, pad

Thanks for your reply! I guessed that strange artifacts like below happens because of those hyperparameters. Don't you have seen those artifacts? I got those artifacts mainly on the front...

Usage of audio_slice_frames, sample_frames, pad

Well, I was training new model from scratch using Korean speech data corpus. It has 300 hours amount of various speakers' utterances, and I was getting those artifacts after I...

Batch size = 16?

@seungwonpark Sorry but I couldn't find the note in the original paper that batch size was carefully chosen. Also, I've thinking that if we use multi-speaker training scheme and use...

Batch size = 16?

Is it obvious that mel-gan works best at batch size 16? I reminded the mention of authors and now it sounds like they realize there are trade-offs between audio fidelity...

poor performance on short phrases

I also experienced the pronunciation problem. My case was worse since the pronunciation significantly get degraded even for long inputs. Have you solve this?

Add inference code

No, I didn't encounter that error. Can you give me more context?