hifi-gan
hifi-gan copied to clipboard
machine noise on 4k frequency
I encouner a bad case: there is machine soud(a line) in the 4k frequency as below. this is breathing part. I try to solve it, Can you give some advice, thanks.
It is difficult to comment because there is no information about the experiment. If you post the experiment settings, training steps, the original audio and the generated audio together, we may be able to give you some helpful comments.
@jik876 This is a Tacotron + hifigan of 48k model. Training step=1M. because of TTS result, there is no original audio. Actually, there is no machine soud in 4k when we just syn audio from groundtruth mel. Below is the tts syn. 54_generated_e2e.wav.zip
Did you add the discriminator for period 2, which I recommended before?
Did you add the discriminator for period 2, which I recommended before?
yes, I have added.
I hope your problem has already been solved. If it is not, try training the model at least 500k steps, and check if periods of MPD are set to minimum overlaps.
@jik876 I am trying 5ms frame shift instead of 12.5ms. But smaller frame shift may give some trouble to acoustic model.
@hdmjdp
Hi, have you solved this question?
No, I just abandon this speaker of 48k. And the 16k of this spk does not exist this problem.