audio-diffusion-pytorch-trainer icon indicating copy to clipboard operation
audio-diffusion-pytorch-trainer copied to clipboard

NaN after a number of epochs

Open gandolfxu opened this issue 1 year ago • 0 comments

  • Dataset: MAESTRO
  • Config: base_dataset_5.yaml.
    • add line 42: path: /root/dataset/maestro-v3.0.0/
    • change: val_split: 0.02
  • Command:
    • python train.py exp=base_dataset_5 trainer.gpus=1

90%|████████▉ | 4004/4463 [1:02:54<07:05, 1.08it/s, epoch=11.9, loss=nan]
90%|████████▉ | 4005/4463 [1:02:54<06:57, 1.10it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4005/4463 [1:02:55<06:57, 1.10it/s, epoch=11.9, loss=0.00485] 90%|████████▉ | 4006/4463 [1:02:55<06:55, 1.10it/s, epoch=11.9, loss=0.00485] 90%|████████▉ | 4006/4463 [1:02:56<06:55, 1.10it/s, epoch=11.9, loss=nan]
90%|████████▉ | 4007/4463 [1:02:56<06:49, 1.11it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4007/4463 [1:02:57<06:49, 1.11it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4008/4463 [1:02:57<06:43, 1.13it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4008/4463 [1:02:58<06:43, 1.13it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4009/4463 [1:02:58<06:38, 1.14it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4009/4463 [1:02:58<06:38, 1.14it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4010/4463 [1:02:58<06:37, 1.14it/s, epoch=11.9, loss=nan] 90%|████████▉ | 4010/4463 [1:02:59<06:37, 1.14it/s, epoch=11.9, loss=0.00942] 90%|████████▉ | 4011/4463 [1:02:59<06:41, 1.13it/s, epoch=11.9, loss=0.00942] 90%|████████▉ | 4011/4463 [1:03:00<06:41, 1.13it/s, epoch=11.9, loss=0.00496] 90%|████████▉ | 4012/4463 [1:03:00<06:48, 1.10it/s, epoch=11.9, loss=0.00496] 90%|████████▉ | 4012/4463 [1:03:01<06:48, 1.10it/s, epoch=11.9, loss=nan]
90%|████████▉ | 4013/4463 [1:03:01<06:51, 1.09it/s, epoch=11.9, loss=nan]

gandolfxu avatar Apr 23 '23 02:04 gandolfxu