hifi-gan
hifi-gan copied to clipboard
How to improve HiFi-GAN output mel spectrogram
Hello, thank you for your great work. I have trained a HiFi-GAN model using the output from text-to-melspec model as the input. The result is great when train to 1m+ steps, but the problem is sometimes some parts of formants are suddenly disappeared from the phoneme as you can see in [pic2], the part in the red rectangle. This causes the output speech to sound coarse/robotic.
[pic1] mel spectrogram from text-to-melspec model:
[pic2] mel spectrogram from hifigan model with the above mel spectrogram as an input:
How can I fix this?