ParallelWaveGAN icon indicating copy to clipboard operation
ParallelWaveGAN copied to clipboard

HifiGAN training -- obvious harmonics in test files

Open Kristopher-Chen opened this issue 3 years ago • 7 comments

I trained HifiGAN on VCTK multi-speaker datasets with 24kHz sampling rate. I also do normalization in the input log-Mel spectrogram (with mean=-4, std=4), and found obvious harmonics in the test files as below. Have you ever met this? Any suggestions? Thank you! image

Kristopher-Chen avatar Mar 14 '22 03:03 Kristopher-Chen

BTW, the discriminators' loss is quite small, which may suggest the discriminators are too strong? image

Kristopher-Chen avatar Mar 14 '22 03:03 Kristopher-Chen

Did you use this repository? Or general question about hifigan?

kan-bayashi avatar Mar 14 '22 11:03 kan-bayashi

Did you use this repository? Or general question about hifigan?

Hi, actually I referred to your repository and the official version. I trained several epochs by the official code and find the discriminators' losses around 0.1~0.2, but also with obvious harmonics. So I wonder if this happens in early training stages? But the discriminators' loss is quite strange...

Kristopher-Chen avatar Mar 15 '22 03:03 Kristopher-Chen

OK. How many iterations did you run? In my experiment, around 200k iters can generate reasonable voice. I'm not familiar with official implementation but in my case official optimizer setting does not work well. The following issue may help you. https://github.com/kan-bayashi/ParallelWaveGAN/issues/278

kan-bayashi avatar Mar 15 '22 05:03 kan-bayashi

OK. How many iterations did you run? In my experiment, around 200k iters can generate reasonable voice. I'm not familiar with official implementation but in my case official optimizer setting does not work well. The following issue may help you. #278

There seems something wrong with the discriminators. The losses get smaller after more epochs. The normal values would be around 0.1~0.2 for each discriminatory, but mine is as below. image

Kristopher-Chen avatar Apr 06 '22 09:04 Kristopher-Chen

@Kristopher-Chen have you resolved the problems?

MlWoo avatar Feb 22 '23 09:02 MlWoo

@Kristopher-Chen have you resolved the problems?

when I refer to the original codes, this problem is solved.

For discriminator losses, the 2nd and 3rd MSD losses are easily becoming small, and others look normal.

Moreover, the feature map loss keeps growing gradually. But interestingly, the generated samples sound natural...

Kristopher-Chen avatar May 11 '23 01:05 Kristopher-Chen