hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

learning loss explosion

Open ikpark09 opened this issue 1 year ago • 0 comments

First of all, thank you. Thanks to the code you provided, it was very helpful in studying TTS and Hifi-Gan.

When attempting to train using the provided code and LJSpeech original data, a learning loss explosion occurred, I've tried things like adjusting lr and other hyperparameters, but it still doesn't work, so I'd like to ask for advice.

Add some learning process and error codes below.

.... checkpoints directory : cp_hifigan Epoch: 1 Steps : 0, Gen Loss Total : 101.349, Mel-Spec. Error : 2.058, s/b : 585.139 Steps : 5, Gen Loss Total : 135.118, Mel-Spec. Error : 2.176, s/b : 0.496 Steps : 10, Gen Loss Total : 106.757, Mel-Spec. Error : 1.938, s/b : 0.435 Steps : 15, Gen Loss Total : 94.055, Mel-Spec. Error : 1.701, s/b : 0.435 Steps : 20, Gen Loss Total : 141.385, Mel-Spec. Error : 1.857, s/b : 0.433 Steps : 25, Gen Loss Total : 196.452, Mel-Spec. Error : 3.813, s/b : 0.443 Steps : 30, Gen Loss Total : 64922832.000, Mel-Spec. Error : 1.813, s/b : 0.454 Steps : 35, Gen Loss Total : 199.692, Mel-Spec. Error : 3.854, s/b : 0.455 Steps : 40, Gen Loss Total : nan, Mel-Spec. Error : 2.021, s/b : 0.443 Steps : 45, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.449 Steps : 50, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.444 Steps : 55, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.436 Steps : 60, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.449 Steps : 65, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.452 Steps : 70, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.452 Steps : 75, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.464

Above is a learning process using the basic config, and after Steps : 45, all Loss values are nan.

Below error occurs at 1000 steps.

Steps : 985, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.430 Steps : 990, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.431 Steps : 995, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.440 Steps : 1000, Gen Loss Total : nan, Mel-Spec. Error : nan, s/b : 0.427 Traceback (most recent call last): File "train.py", line 271, in main() File "train.py", line 267, in main train(0, a, h) File "train.py", line 206, in train sw.add_audio('generated/y_hat_{}'.format(j), y_g_hat[0], steps, h.sampling_rate) File "/home/a/miniconda3/envs/hifi/lib/python3.8/site-packages/torch/utils/tensorboard/writer.py", line 669, in add_audio audio(tag, snd_tensor, sample_rate=sample_rate), global_step, walltime) File "/home/a/miniconda3/envs/hifi/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 444, in audio tensor_list = [int(32767.0 * x) for x in tensor] File "/home/a/miniconda3/envs/hifi/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py", line 444, in tensor_list = [int(32767.0 * x) for x in tensor] ValueError: cannot convert float NaN to integer

Waiting for your reply. thank you

ikpark09 avatar Jul 03 '23 02:07 ikpark09