hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

machine noise on 4k frequency

Open hdmjdp opened this issue 4 years ago • 8 comments

I encouner a bad case: there is machine soud(a line) in the 4k frequency as below. this is breathing part. I try to solve it, Can you give some advice, thanks.

图片

hdmjdp avatar Dec 31 '20 05:12 hdmjdp

It is difficult to comment because there is no information about the experiment. If you post the experiment settings, training steps, the original audio and the generated audio together, we may be able to give you some helpful comments.

jik876 avatar Jan 03 '21 06:01 jik876

@jik876 This is a Tacotron + hifigan of 48k model. Training step=1M. because of TTS result, there is no original audio. Actually, there is no machine soud in 4k when we just syn audio from groundtruth mel. Below is the tts syn. 54_generated_e2e.wav.zip

hdmjdp avatar Jan 04 '21 02:01 hdmjdp

Did you add the discriminator for period 2, which I recommended before?

jik876 avatar Jan 11 '21 02:01 jik876

Did you add the discriminator for period 2, which I recommended before?

yes, I have added.

hdmjdp avatar Jan 11 '21 07:01 hdmjdp

I hope your problem has already been solved. If it is not, try training the model at least 500k steps, and check if periods of MPD are set to minimum overlaps.

jik876 avatar Jan 18 '21 10:01 jik876

@jik876 I am trying 5ms frame shift instead of 12.5ms. But smaller frame shift may give some trouble to acoustic model.

hdmjdp avatar Jan 19 '21 03:01 hdmjdp

@hdmjdp

Hi, have you solved this question?

Alexey322 avatar Jan 20 '21 09:01 Alexey322

No, I just abandon this speaker of 48k. And the 16k of this spk does not exist this problem.

hdmjdp avatar Jan 21 '21 07:01 hdmjdp