Jaehyeon Kim comments

Results 16 comments of


Jaehyeon Kim

One question about the decoder compared with FastSpeech and Tacotron.

I have no empirical evidence, but I think the difference comes from whether to capture dependencies between output mel-spectrogram frames or not. The probabilistic modeling of each model is quite...

Any method could make the result more nature?

If your concern is the prosody of the synthesized samples such as intonation, some techniques such as prosody embedding, and style tokens could be useful. In my ongoing experiments, such...

Ideal size of gin_channels for multiple speaker embeddings?

@echelon Hi echelon. As I haven't tested on such small datasets, I couldn't give you a solution. Sorry for that. In my case, I didn't care much for the dimension,...

Number of training steps

10000 epochs is meaningless. You can reduce number of epochs, or just cancel during training. I trained my model with the base config on 2 V100 gpus, and it took...

[ERROR] monotonic_align.core

Sorry, I think your information is not enough to find out errors. Could you give more details? For example, is there no error when you build the cython code?

Wrong Implementation of Discriminator

While I fix those two problems in [a new branch](https://github.com/jaywalnut310/MelGAN-Pytorch/tree/research), I found synthesis quality's worse than before. There are three things that I changed: 2 problems as above, and melspectrogram...