hifigan
hifigan copied to clipboard
NaN during training when using own dataset
While fine-tuning works as expected, doing regular training with a dataset that isn't LJSpeech would eventually cause a NaN loss at some point.
The culprit appears to be the following line, which causes a division by zero if wav happens to contain perfect silence:
https://github.com/bshall/hifigan/blob/374a4569eae5437e2c80d27790ff6fede9fc1c46/hifigan/dataset.py#L106
I'm not sure what the best solution for this would be, as a quick fix I simply clipped the divisor so it can't reach zero:
wav = flip * gain * wav / max([wav.abs().max(), 0.001])
met same issue with you!!