hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

How to improve HiFi-GAN output mel spectrogram

Open schnekk opened this issue 1 year ago • 0 comments

Hello, thank you for your great work. I have trained a HiFi-GAN model using the output from text-to-melspec model as the input. The result is great when train to 1m+ steps, but the problem is sometimes some parts of formants are suddenly disappeared from the phoneme as you can see in [pic2], the part in the red rectangle. This causes the output speech to sound coarse/robotic.

[pic1] mel spectrogram from text-to-melspec model: tts_mel [pic2] mel spectrogram from hifigan model with the above mel spectrogram as an input: hifigan_mel

How can I fix this?

schnekk avatar Jul 04 '23 11:07 schnekk