CAP-VSTNet icon indicating copy to clipboard operation
CAP-VSTNet copied to clipboard

Faced unexpected pause when training epoch is 16160

Open LT1st opened this issue 1 year ago • 7 comments

These are the output of console:

Iteration: 00161080/00170000  content_loss:0.0000  lap_loss:0.3854  rec_loss:0.0622  style_loss:1.4862  loss_tmp:0.5256  loss_tmp_GT:0.0664
Iteration: 00161090/00170000  content_loss:0.0000  lap_loss:0.1441  rec_loss:0.1067  style_loss:0.7328  loss_tmp:0.2622  loss_tmp_GT:0.0847
Iteration: 00161100/00170000  content_loss:0.0000  lap_loss:0.0956  rec_loss:0.0610  style_loss:0.3879  loss_tmp:0.4483  loss_tmp_GT:0.0935
Iteration: 00161110/00170000  content_loss:0.0000  lap_loss:0.1170  rec_loss:0.0750  style_loss:0.6948  loss_tmp:0.2367  loss_tmp_GT:0.0769
Iteration: 00161120/00170000  content_loss:0.0000  lap_loss:0.0835  rec_loss:0.0324  style_loss:0.3586  loss_tmp:0.2265  loss_tmp_GT:0.0790
Iteration: 00161130/00170000  content_loss:0.0000  lap_loss:0.1715  rec_loss:0.0607  style_loss:1.0338  loss_tmp:1.1665  loss_tmp_GT:0.0691
Iteration: 00161140/00170000  content_loss:0.0000  lap_loss:0.1329  rec_loss:0.0573  style_loss:0.6555  loss_tmp:0.2451  loss_tmp_GT:0.0630
Iteration: 00161150/00170000  content_loss:0.0000  lap_loss:0.0865  rec_loss:0.0353  style_loss:0.3672  loss_tmp:0.2072  loss_tmp_GT:0.0798
Iteration: 00161160/00170000  content_loss:0.0000  lap_loss:0.1805  rec_loss:0.0556  style_loss:1.0472  loss_tmp:0.4310  loss_tmp_GT:0.0580
Iteration: 00161170/00170000  content_loss:0.0000  lap_loss:0.0714  rec_loss:0.0337  style_loss:0.4977  loss_tmp:0.5335  loss_tmp_GT:0.0828
Iteration: 00161180/00170000  content_loss:0.0000  lap_loss:0.1115  rec_loss:0.0504  style_loss:0.7589  loss_tmp:0.3220  loss_tmp_GT:0.0782
Iteration: 00161190/00170000  content_loss:0.0000  lap_loss:0.0688  rec_loss:0.0449  style_loss:0.3667  loss_tmp:0.3961  loss_tmp_GT:0.0545
Iteration: 00161200/00170000  content_loss:0.0000  lap_loss:0.0567  rec_loss:0.0391  style_loss:0.3564  loss_tmp:0.2393  loss_tmp_GT:0.0682
Iteration: 00161210/00170000  content_loss:0.0000  lap_loss:0.1973  rec_loss:0.3097  style_loss:0.3421  loss_tmp:0.2684  loss_tmp_GT:0.0742
Iteration: 00161220/00170000  content_loss:0.0000  lap_loss:0.1011  rec_loss:0.0443  style_loss:0.4991  loss_tmp:0.7559  loss_tmp_GT:0.0832
Iteration: 00161230/00170000  content_loss:0.0000  lap_loss:0.0907  rec_loss:0.0408  style_loss:0.3279  loss_tmp:0.2799  loss_tmp_GT:0.0609
Iteration: 00161240/00170000  content_loss:0.0000  lap_loss:0.1845  rec_loss:0.1205  style_loss:0.3565  loss_tmp:0.2985  loss_tmp_GT:0.0518
Iteration: 00161250/00170000  content_loss:0.0000  lap_loss:0.2289  rec_loss:0.1843  style_loss:0.3027  loss_tmp:0.2727  loss_tmp_GT:0.0621
Iteration: 00161260/00170000  content_loss:0.0000  lap_loss:0.3555  rec_loss:0.1109  style_loss:1.1843  loss_tmp:0.5432  loss_tmp_GT:0.0804
Iteration: 00161270/00170000  content_loss:0.0000  lap_loss:715.7004  rec_loss:0.9811  style_loss:49.2091  loss_tmp:8.3554  loss_tmp_GT:0.0722
Iteration: 00161280/00170000  content_loss:0.0000  lap_loss:0.3179  rec_loss:0.0679  style_loss:0.5367  loss_tmp:0.3266  loss_tmp_GT:0.0490
Iteration: 00161290/00170000  content_loss:0.0000  lap_loss:0.3358  rec_loss:0.1061  style_loss:0.6838  loss_tmp:0.5130  loss_tmp_GT:0.0722
Iteration: 00161300/00170000  content_loss:0.0000  lap_loss:0.3460  rec_loss:0.0656  style_loss:0.5438  loss_tmp:0.3704  loss_tmp_GT:0.0931
Iteration: 00161310/00170000  content_loss:0.0000  lap_loss:1190.3612  rec_loss:1.0076  style_loss:102.2295  loss_tmp:8.4687  loss_tmp_GT:0.0529
Iteration: 00161320/00170000  content_loss:0.0000  lap_loss:0.2564  rec_loss:0.0999  style_loss:0.4567  loss_tmp:0.3154  loss_tmp_GT:0.0887
Iteration: 00161330/00170000  content_loss:0.0000  lap_loss:0.3323  rec_loss:0.1052  style_loss:1.4866  loss_tmp:0.4579  loss_tmp_GT:0.0910
Iteration: 00161340/00170000  content_loss:0.0000  lap_loss:0.2228  rec_loss:0.0693  style_loss:0.3814  loss_tmp:0.2982  loss_tmp_GT:0.0956
Iteration: 00161350/00170000  content_loss:0.0000  lap_loss:0.3161  rec_loss:0.0936  style_loss:0.7369  loss_tmp:0.5142  loss_tmp_GT:0.0825
Iteration: 00161360/00170000  content_loss:0.0000  lap_loss:0.2863  rec_loss:0.0664  style_loss:0.7711  loss_tmp:0.3755  loss_tmp_GT:0.0543
Iteration: 00161370/00170000  content_loss:0.0000  lap_loss:0.2393  rec_loss:0.0665  style_loss:0.4124  loss_tmp:0.5033  loss_tmp_GT:0.0546
Iteration: 00161380/00170000  content_loss:0.0000  lap_loss:0.4465  rec_loss:0.0993  style_loss:0.8214  loss_tmp:0.3623  loss_tmp_GT:0.0508
Iteration: 00161390/00170000  content_loss:0.0000  lap_loss:0.3830  rec_loss:0.1114  style_loss:0.8339  loss_tmp:0.4083  loss_tmp_GT:0.0753
Iteration: 00161400/00170000  content_loss:0.0000  lap_loss:0.7490  rec_loss:0.0830  style_loss:1.9559  loss_tmp:0.5335  loss_tmp_GT:0.0926
Iteration: 00161410/00170000  content_loss:0.0000  lap_loss:0.4318  rec_loss:0.1619  style_loss:0.3361  loss_tmp:0.4007  loss_tmp_GT:0.0939
Iteration: 00161420/00170000  content_loss:0.0000  lap_loss:0.6868  rec_loss:0.0895  style_loss:0.9060  loss_tmp:1.2179  loss_tmp_GT:0.0785
Iteration: 00161430/00170000  content_loss:0.0000  lap_loss:2.0505  rec_loss:0.1317  style_loss:0.4949  loss_tmp:1.1039  loss_tmp_GT:0.0491
Iteration: 00161440/00170000  content_loss:0.0000  lap_loss:0.9979  rec_loss:0.1391  style_loss:1.0453  loss_tmp:0.6287  loss_tmp_GT:0.0558
Iteration: 00161450/00170000  content_loss:0.0000  lap_loss:1.2907  rec_loss:0.1996  style_loss:0.8235  loss_tmp:0.7697  loss_tmp_GT:0.0757
Iteration: 00161460/00170000  content_loss:0.0000  lap_loss:1.2174  rec_loss:0.2214  style_loss:0.8450  loss_tmp:0.8341  loss_tmp_GT:0.0556
Iteration: 00161470/00170000  content_loss:0.0000  lap_loss:1.5833  rec_loss:0.1535  style_loss:0.8611  loss_tmp:0.7469  loss_tmp_GT:0.0901
Iteration: 00161480/00170000  content_loss:0.0000  lap_loss:1.6554  rec_loss:0.1670  style_loss:0.7574  loss_tmp:0.7843  loss_tmp_GT:0.0714
Iteration: 00161490/00170000  content_loss:0.0000  lap_loss:1.5283  rec_loss:0.1308  style_loss:0.4994  loss_tmp:0.7239  loss_tmp_GT:0.0898
Iteration: 00161500/00170000  content_loss:0.0000  lap_loss:1.4131  rec_loss:0.1164  style_loss:1.0087  loss_tmp:0.6687  loss_tmp_GT:0.0719
Iteration: 00161510/00170000  content_loss:0.0000  lap_loss:1.3814  rec_loss:0.1189  style_loss:0.6020  loss_tmp:0.8305  loss_tmp_GT:0.0644
Iteration: 00161520/00170000  content_loss:0.0000  lap_loss:1.2963  rec_loss:0.1918  style_loss:0.7768  loss_tmp:0.6962  loss_tmp_GT:0.0777
Iteration: 00161530/00170000  content_loss:0.0000  lap_loss:1.3077  rec_loss:0.1180  style_loss:1.2366  loss_tmp:0.6606  loss_tmp_GT:0.0754
Iteration: 00161540/00170000  content_loss:0.0000  lap_loss:1.8840  rec_loss:0.1963  style_loss:0.6856  loss_tmp:0.8398  loss_tmp_GT:0.0790
Iteration: 00161550/00170000  content_loss:0.0000  lap_loss:37.4161  rec_loss:0.4983  style_loss:8.9974  loss_tmp:3.0548  loss_tmp_GT:0.0554
Iteration: 00161560/00170000  content_loss:0.0000  lap_loss:0.9423  rec_loss:0.1765  style_loss:0.5690  loss_tmp:0.9694  loss_tmp_GT:0.0606
Iteration: 00161570/00170000  content_loss:0.0000  lap_loss:0.8936  rec_loss:0.1570  style_loss:0.8383  loss_tmp:0.6511  loss_tmp_GT:0.0804
Iteration: 00161580/00170000  content_loss:0.0000  lap_loss:1.3945  rec_loss:0.4109  style_loss:0.8251  loss_tmp:0.9200  loss_tmp_GT:0.0933
Iteration: 00161590/00170000  content_loss:0.0000  lap_loss:1.4182  rec_loss:0.1355  style_loss:0.8771  loss_tmp:0.7025  loss_tmp_GT:0.0806
Iteration: 00161600/00170000  content_loss:0.0000  lap_loss:2.0692  rec_loss:0.2017  style_loss:0.4177  loss_tmp:0.8337  loss_tmp_GT:0.0946
Iteration: 00161610/00170000  content_loss:0.0000  lap_loss:397.9501  rec_loss:1.3158  style_loss:46.0847  loss_tmp:9.6579  loss_tmp_GT:0.0553
Iteration: 00161620/00170000  content_loss:0.0000  lap_loss:4.0763  rec_loss:0.3362  style_loss:0.6721  loss_tmp:1.3075  loss_tmp_GT:0.0777
Iteration: 00161630/00170000  content_loss:0.0000  lap_loss:10.1120  rec_loss:0.5004  style_loss:1.2209  loss_tmp:1.7422  loss_tmp_GT:0.0882
Iteration: 00161640/00170000  content_loss:0.0000  lap_loss:5.8842  rec_loss:0.3475  style_loss:0.8090  loss_tmp:1.5759  loss_tmp_GT:0.0754
Iteration: 00161650/00170000  content_loss:0.0000  lap_loss:7.2984  rec_loss:0.3977  style_loss:1.8986  loss_tmp:1.9794  loss_tmp_GT:0.0659
Iteration: 00161660/00170000  content_loss:0.0000  lap_loss:16.9144  rec_loss:0.5072  style_loss:1.3880  loss_tmp:3.1661  loss_tmp_GT:0.0762
Iteration: 00161670/00170000  content_loss:0.0000  lap_loss:8.6051  rec_loss:0.4152  style_loss:0.9209  loss_tmp:1.7897  loss_tmp_GT:0.0745
Iteration: 00161680/00170000  content_loss:0.0000  lap_loss:18.7265  rec_loss:0.7623  style_loss:1.7309  loss_tmp:2.4511  loss_tmp_GT:0.0596
Iteration: 00161690/00170000  content_loss:0.0000  lap_loss:26.2579  rec_loss:0.9497  style_loss:3.5073  loss_tmp:3.2847  loss_tmp_GT:0.0746
Iteration: 00161700/00170000  content_loss:0.0000  lap_loss:40.5071  rec_loss:1.2338  style_loss:4.2289  loss_tmp:4.3614  loss_tmp_GT:0.0877
Iteration: 00161710/00170000  content_loss:0.0000  lap_loss:75.8527  rec_loss:1.7527  style_loss:7.1905  loss_tmp:6.1432  loss_tmp_GT:0.0622
Iteration: 00161720/00170000  content_loss:0.0000  lap_loss:132.4727  rec_loss:3.4893  style_loss:10.6970  loss_tmp:7.7992  loss_tmp_GT:0.0869
Iteration: 00161730/00170000  content_loss:0.0000  lap_loss:164.3445  rec_loss:2.3470  style_loss:10.5640  loss_tmp:9.1285  loss_tmp_GT:0.0617
Iteration: 00161740/00170000  content_loss:0.0000  lap_loss:163.4563  rec_loss:1.8969  style_loss:9.1780  loss_tmp:10.1247  loss_tmp_GT:0.0708
Iteration: 00161750/00170000  content_loss:0.0000  lap_loss:418.0580  rec_loss:6.6835  style_loss:18.3620  loss_tmp:14.3527  loss_tmp_GT:0.0823
Iteration: 00161760/00170000  content_loss:0.0000  lap_loss:599.1832  rec_loss:9.9018  style_loss:54.3852  loss_tmp:16.5558  loss_tmp_GT:0.0779
Iteration: 00161770/00170000  content_loss:0.0000  lap_loss:1377.3221  rec_loss:11.8466  style_loss:88.2415  loss_tmp:20.6850  loss_tmp_GT:0.0696
Iteration: 00161780/00170000  content_loss:0.0000  lap_loss:1219.5043  rec_loss:13.1540  style_loss:73.4607  loss_tmp:25.3794  loss_tmp_GT:0.0756
Iteration: 00161790/00170000  content_loss:0.0000  lap_loss:5094.0000  rec_loss:9.5246  style_loss:221.0599  loss_tmp:33.7047  loss_tmp_GT:0.0947
Iteration: 00161800/00170000  content_loss:0.0000  lap_loss:1941.8975  rec_loss:31.7524  style_loss:250.0916  loss_tmp:68.2838  loss_tmp_GT:0.0637
Iteration: 00161810/00170000  content_loss:0.0000  lap_loss:3418.7014  rec_loss:20.3584  style_loss:232.5759  loss_tmp:42.6371  loss_tmp_GT:0.0911
Iteration: 00161820/00170000  content_loss:0.0000  lap_loss:418235104.0000  rec_loss:214.8812  style_loss:9325154.0000  loss_tmp:4957.1558  loss_tmp_GT:0.0785
Iteration: 00161830/00170000  content_loss:0.0000  lap_loss:9133.0684  rec_loss:72.6381  style_loss:698.3317  loss_tmp:58.0809  loss_tmp_GT:0.0579
Iteration: 00161840/00170000  content_loss:0.0000  lap_loss:9114.7314  rec_loss:48.7064  style_loss:624.2943  loss_tmp:56.9594  loss_tmp_GT:0.0858
Iteration: 00161850/00170000  content_loss:0.0000  lap_loss:16554.5078  rec_loss:104.8364  style_loss:1542.5042  loss_tmp:88.9630  loss_tmp_GT:0.0712
Iteration: 00161860/00170000  content_loss:0.0000  lap_loss:10247.7246  rec_loss:65.9900  style_loss:1027.9641  loss_tmp:96.4846  loss_tmp_GT:0.0727
Iteration: 00161870/00170000  content_loss:0.0000  lap_loss:19196.0527  rec_loss:77.2881  style_loss:1428.4135  loss_tmp:125.3436  loss_tmp_GT:0.0677
Iteration: 00161880/00170000  content_loss:0.0000  lap_loss:216289.6719  rec_loss:98.5644  style_loss:14655.6758  loss_tmp:218.4098  loss_tmp_GT:0.0702
Iteration: 00161890/00170000  content_loss:0.0000  lap_loss:19604.2520  rec_loss:50.9366  style_loss:1325.8600  loss_tmp:95.0826  loss_tmp_GT:0.0942
Iteration: 00161900/00170000  content_loss:0.0000  lap_loss:93659.1016  rec_loss:297.3892  style_loss:5191.9561  loss_tmp:227.0423  loss_tmp_GT:0.0611
Iteration: 00161910/00170000  content_loss:0.0000  lap_loss:86273.3594  rec_loss:174.2626  style_loss:4537.2666  loss_tmp:169.1176  loss_tmp_GT:0.0884
Iteration: 00161920/00170000  content_loss:0.0000  lap_loss:100730.4844  rec_loss:231.1616  style_loss:8772.8340  loss_tmp:295.8207  loss_tmp_GT:0.0779
Iteration: 00161930/00170000  content_loss:0.0000  lap_loss:389786.8125  rec_loss:618.0142  style_loss:26791.2461  loss_tmp:366.0463  loss_tmp_GT:0.0742
Iteration: 00161940/00170000  content_loss:0.0000  lap_loss:15906467840.0000  rec_loss:14955.5361  style_loss:303860064.0000  loss_tmp:72561.6250  loss_tmp_GT:0.0740

The nvidia-smi gives the info below:

Tue Feb 13 11:46:41 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:5E:00.0  On |                  N/A |
| 51%   38C    P2   103W / 350W |  13277MiB / 24576MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:AF:00.0 Off |                  N/A |
|  0%   39C    P8    15W / 350W |      5MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     16179      G   /usr/lib/xorg/Xorg                153MiB |
|    0   N/A  N/A    610515      C   python                          13120MiB |
|    1   N/A  N/A     16179      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Should I give up training?

LT1st avatar Feb 13 '24 03:02 LT1st