CAP-VSTNet
CAP-VSTNet copied to clipboard
Faced unexpected pause when training epoch is 16160
These are the output of console:
Iteration: 00161080/00170000 content_loss:0.0000 lap_loss:0.3854 rec_loss:0.0622 style_loss:1.4862 loss_tmp:0.5256 loss_tmp_GT:0.0664
Iteration: 00161090/00170000 content_loss:0.0000 lap_loss:0.1441 rec_loss:0.1067 style_loss:0.7328 loss_tmp:0.2622 loss_tmp_GT:0.0847
Iteration: 00161100/00170000 content_loss:0.0000 lap_loss:0.0956 rec_loss:0.0610 style_loss:0.3879 loss_tmp:0.4483 loss_tmp_GT:0.0935
Iteration: 00161110/00170000 content_loss:0.0000 lap_loss:0.1170 rec_loss:0.0750 style_loss:0.6948 loss_tmp:0.2367 loss_tmp_GT:0.0769
Iteration: 00161120/00170000 content_loss:0.0000 lap_loss:0.0835 rec_loss:0.0324 style_loss:0.3586 loss_tmp:0.2265 loss_tmp_GT:0.0790
Iteration: 00161130/00170000 content_loss:0.0000 lap_loss:0.1715 rec_loss:0.0607 style_loss:1.0338 loss_tmp:1.1665 loss_tmp_GT:0.0691
Iteration: 00161140/00170000 content_loss:0.0000 lap_loss:0.1329 rec_loss:0.0573 style_loss:0.6555 loss_tmp:0.2451 loss_tmp_GT:0.0630
Iteration: 00161150/00170000 content_loss:0.0000 lap_loss:0.0865 rec_loss:0.0353 style_loss:0.3672 loss_tmp:0.2072 loss_tmp_GT:0.0798
Iteration: 00161160/00170000 content_loss:0.0000 lap_loss:0.1805 rec_loss:0.0556 style_loss:1.0472 loss_tmp:0.4310 loss_tmp_GT:0.0580
Iteration: 00161170/00170000 content_loss:0.0000 lap_loss:0.0714 rec_loss:0.0337 style_loss:0.4977 loss_tmp:0.5335 loss_tmp_GT:0.0828
Iteration: 00161180/00170000 content_loss:0.0000 lap_loss:0.1115 rec_loss:0.0504 style_loss:0.7589 loss_tmp:0.3220 loss_tmp_GT:0.0782
Iteration: 00161190/00170000 content_loss:0.0000 lap_loss:0.0688 rec_loss:0.0449 style_loss:0.3667 loss_tmp:0.3961 loss_tmp_GT:0.0545
Iteration: 00161200/00170000 content_loss:0.0000 lap_loss:0.0567 rec_loss:0.0391 style_loss:0.3564 loss_tmp:0.2393 loss_tmp_GT:0.0682
Iteration: 00161210/00170000 content_loss:0.0000 lap_loss:0.1973 rec_loss:0.3097 style_loss:0.3421 loss_tmp:0.2684 loss_tmp_GT:0.0742
Iteration: 00161220/00170000 content_loss:0.0000 lap_loss:0.1011 rec_loss:0.0443 style_loss:0.4991 loss_tmp:0.7559 loss_tmp_GT:0.0832
Iteration: 00161230/00170000 content_loss:0.0000 lap_loss:0.0907 rec_loss:0.0408 style_loss:0.3279 loss_tmp:0.2799 loss_tmp_GT:0.0609
Iteration: 00161240/00170000 content_loss:0.0000 lap_loss:0.1845 rec_loss:0.1205 style_loss:0.3565 loss_tmp:0.2985 loss_tmp_GT:0.0518
Iteration: 00161250/00170000 content_loss:0.0000 lap_loss:0.2289 rec_loss:0.1843 style_loss:0.3027 loss_tmp:0.2727 loss_tmp_GT:0.0621
Iteration: 00161260/00170000 content_loss:0.0000 lap_loss:0.3555 rec_loss:0.1109 style_loss:1.1843 loss_tmp:0.5432 loss_tmp_GT:0.0804
Iteration: 00161270/00170000 content_loss:0.0000 lap_loss:715.7004 rec_loss:0.9811 style_loss:49.2091 loss_tmp:8.3554 loss_tmp_GT:0.0722
Iteration: 00161280/00170000 content_loss:0.0000 lap_loss:0.3179 rec_loss:0.0679 style_loss:0.5367 loss_tmp:0.3266 loss_tmp_GT:0.0490
Iteration: 00161290/00170000 content_loss:0.0000 lap_loss:0.3358 rec_loss:0.1061 style_loss:0.6838 loss_tmp:0.5130 loss_tmp_GT:0.0722
Iteration: 00161300/00170000 content_loss:0.0000 lap_loss:0.3460 rec_loss:0.0656 style_loss:0.5438 loss_tmp:0.3704 loss_tmp_GT:0.0931
Iteration: 00161310/00170000 content_loss:0.0000 lap_loss:1190.3612 rec_loss:1.0076 style_loss:102.2295 loss_tmp:8.4687 loss_tmp_GT:0.0529
Iteration: 00161320/00170000 content_loss:0.0000 lap_loss:0.2564 rec_loss:0.0999 style_loss:0.4567 loss_tmp:0.3154 loss_tmp_GT:0.0887
Iteration: 00161330/00170000 content_loss:0.0000 lap_loss:0.3323 rec_loss:0.1052 style_loss:1.4866 loss_tmp:0.4579 loss_tmp_GT:0.0910
Iteration: 00161340/00170000 content_loss:0.0000 lap_loss:0.2228 rec_loss:0.0693 style_loss:0.3814 loss_tmp:0.2982 loss_tmp_GT:0.0956
Iteration: 00161350/00170000 content_loss:0.0000 lap_loss:0.3161 rec_loss:0.0936 style_loss:0.7369 loss_tmp:0.5142 loss_tmp_GT:0.0825
Iteration: 00161360/00170000 content_loss:0.0000 lap_loss:0.2863 rec_loss:0.0664 style_loss:0.7711 loss_tmp:0.3755 loss_tmp_GT:0.0543
Iteration: 00161370/00170000 content_loss:0.0000 lap_loss:0.2393 rec_loss:0.0665 style_loss:0.4124 loss_tmp:0.5033 loss_tmp_GT:0.0546
Iteration: 00161380/00170000 content_loss:0.0000 lap_loss:0.4465 rec_loss:0.0993 style_loss:0.8214 loss_tmp:0.3623 loss_tmp_GT:0.0508
Iteration: 00161390/00170000 content_loss:0.0000 lap_loss:0.3830 rec_loss:0.1114 style_loss:0.8339 loss_tmp:0.4083 loss_tmp_GT:0.0753
Iteration: 00161400/00170000 content_loss:0.0000 lap_loss:0.7490 rec_loss:0.0830 style_loss:1.9559 loss_tmp:0.5335 loss_tmp_GT:0.0926
Iteration: 00161410/00170000 content_loss:0.0000 lap_loss:0.4318 rec_loss:0.1619 style_loss:0.3361 loss_tmp:0.4007 loss_tmp_GT:0.0939
Iteration: 00161420/00170000 content_loss:0.0000 lap_loss:0.6868 rec_loss:0.0895 style_loss:0.9060 loss_tmp:1.2179 loss_tmp_GT:0.0785
Iteration: 00161430/00170000 content_loss:0.0000 lap_loss:2.0505 rec_loss:0.1317 style_loss:0.4949 loss_tmp:1.1039 loss_tmp_GT:0.0491
Iteration: 00161440/00170000 content_loss:0.0000 lap_loss:0.9979 rec_loss:0.1391 style_loss:1.0453 loss_tmp:0.6287 loss_tmp_GT:0.0558
Iteration: 00161450/00170000 content_loss:0.0000 lap_loss:1.2907 rec_loss:0.1996 style_loss:0.8235 loss_tmp:0.7697 loss_tmp_GT:0.0757
Iteration: 00161460/00170000 content_loss:0.0000 lap_loss:1.2174 rec_loss:0.2214 style_loss:0.8450 loss_tmp:0.8341 loss_tmp_GT:0.0556
Iteration: 00161470/00170000 content_loss:0.0000 lap_loss:1.5833 rec_loss:0.1535 style_loss:0.8611 loss_tmp:0.7469 loss_tmp_GT:0.0901
Iteration: 00161480/00170000 content_loss:0.0000 lap_loss:1.6554 rec_loss:0.1670 style_loss:0.7574 loss_tmp:0.7843 loss_tmp_GT:0.0714
Iteration: 00161490/00170000 content_loss:0.0000 lap_loss:1.5283 rec_loss:0.1308 style_loss:0.4994 loss_tmp:0.7239 loss_tmp_GT:0.0898
Iteration: 00161500/00170000 content_loss:0.0000 lap_loss:1.4131 rec_loss:0.1164 style_loss:1.0087 loss_tmp:0.6687 loss_tmp_GT:0.0719
Iteration: 00161510/00170000 content_loss:0.0000 lap_loss:1.3814 rec_loss:0.1189 style_loss:0.6020 loss_tmp:0.8305 loss_tmp_GT:0.0644
Iteration: 00161520/00170000 content_loss:0.0000 lap_loss:1.2963 rec_loss:0.1918 style_loss:0.7768 loss_tmp:0.6962 loss_tmp_GT:0.0777
Iteration: 00161530/00170000 content_loss:0.0000 lap_loss:1.3077 rec_loss:0.1180 style_loss:1.2366 loss_tmp:0.6606 loss_tmp_GT:0.0754
Iteration: 00161540/00170000 content_loss:0.0000 lap_loss:1.8840 rec_loss:0.1963 style_loss:0.6856 loss_tmp:0.8398 loss_tmp_GT:0.0790
Iteration: 00161550/00170000 content_loss:0.0000 lap_loss:37.4161 rec_loss:0.4983 style_loss:8.9974 loss_tmp:3.0548 loss_tmp_GT:0.0554
Iteration: 00161560/00170000 content_loss:0.0000 lap_loss:0.9423 rec_loss:0.1765 style_loss:0.5690 loss_tmp:0.9694 loss_tmp_GT:0.0606
Iteration: 00161570/00170000 content_loss:0.0000 lap_loss:0.8936 rec_loss:0.1570 style_loss:0.8383 loss_tmp:0.6511 loss_tmp_GT:0.0804
Iteration: 00161580/00170000 content_loss:0.0000 lap_loss:1.3945 rec_loss:0.4109 style_loss:0.8251 loss_tmp:0.9200 loss_tmp_GT:0.0933
Iteration: 00161590/00170000 content_loss:0.0000 lap_loss:1.4182 rec_loss:0.1355 style_loss:0.8771 loss_tmp:0.7025 loss_tmp_GT:0.0806
Iteration: 00161600/00170000 content_loss:0.0000 lap_loss:2.0692 rec_loss:0.2017 style_loss:0.4177 loss_tmp:0.8337 loss_tmp_GT:0.0946
Iteration: 00161610/00170000 content_loss:0.0000 lap_loss:397.9501 rec_loss:1.3158 style_loss:46.0847 loss_tmp:9.6579 loss_tmp_GT:0.0553
Iteration: 00161620/00170000 content_loss:0.0000 lap_loss:4.0763 rec_loss:0.3362 style_loss:0.6721 loss_tmp:1.3075 loss_tmp_GT:0.0777
Iteration: 00161630/00170000 content_loss:0.0000 lap_loss:10.1120 rec_loss:0.5004 style_loss:1.2209 loss_tmp:1.7422 loss_tmp_GT:0.0882
Iteration: 00161640/00170000 content_loss:0.0000 lap_loss:5.8842 rec_loss:0.3475 style_loss:0.8090 loss_tmp:1.5759 loss_tmp_GT:0.0754
Iteration: 00161650/00170000 content_loss:0.0000 lap_loss:7.2984 rec_loss:0.3977 style_loss:1.8986 loss_tmp:1.9794 loss_tmp_GT:0.0659
Iteration: 00161660/00170000 content_loss:0.0000 lap_loss:16.9144 rec_loss:0.5072 style_loss:1.3880 loss_tmp:3.1661 loss_tmp_GT:0.0762
Iteration: 00161670/00170000 content_loss:0.0000 lap_loss:8.6051 rec_loss:0.4152 style_loss:0.9209 loss_tmp:1.7897 loss_tmp_GT:0.0745
Iteration: 00161680/00170000 content_loss:0.0000 lap_loss:18.7265 rec_loss:0.7623 style_loss:1.7309 loss_tmp:2.4511 loss_tmp_GT:0.0596
Iteration: 00161690/00170000 content_loss:0.0000 lap_loss:26.2579 rec_loss:0.9497 style_loss:3.5073 loss_tmp:3.2847 loss_tmp_GT:0.0746
Iteration: 00161700/00170000 content_loss:0.0000 lap_loss:40.5071 rec_loss:1.2338 style_loss:4.2289 loss_tmp:4.3614 loss_tmp_GT:0.0877
Iteration: 00161710/00170000 content_loss:0.0000 lap_loss:75.8527 rec_loss:1.7527 style_loss:7.1905 loss_tmp:6.1432 loss_tmp_GT:0.0622
Iteration: 00161720/00170000 content_loss:0.0000 lap_loss:132.4727 rec_loss:3.4893 style_loss:10.6970 loss_tmp:7.7992 loss_tmp_GT:0.0869
Iteration: 00161730/00170000 content_loss:0.0000 lap_loss:164.3445 rec_loss:2.3470 style_loss:10.5640 loss_tmp:9.1285 loss_tmp_GT:0.0617
Iteration: 00161740/00170000 content_loss:0.0000 lap_loss:163.4563 rec_loss:1.8969 style_loss:9.1780 loss_tmp:10.1247 loss_tmp_GT:0.0708
Iteration: 00161750/00170000 content_loss:0.0000 lap_loss:418.0580 rec_loss:6.6835 style_loss:18.3620 loss_tmp:14.3527 loss_tmp_GT:0.0823
Iteration: 00161760/00170000 content_loss:0.0000 lap_loss:599.1832 rec_loss:9.9018 style_loss:54.3852 loss_tmp:16.5558 loss_tmp_GT:0.0779
Iteration: 00161770/00170000 content_loss:0.0000 lap_loss:1377.3221 rec_loss:11.8466 style_loss:88.2415 loss_tmp:20.6850 loss_tmp_GT:0.0696
Iteration: 00161780/00170000 content_loss:0.0000 lap_loss:1219.5043 rec_loss:13.1540 style_loss:73.4607 loss_tmp:25.3794 loss_tmp_GT:0.0756
Iteration: 00161790/00170000 content_loss:0.0000 lap_loss:5094.0000 rec_loss:9.5246 style_loss:221.0599 loss_tmp:33.7047 loss_tmp_GT:0.0947
Iteration: 00161800/00170000 content_loss:0.0000 lap_loss:1941.8975 rec_loss:31.7524 style_loss:250.0916 loss_tmp:68.2838 loss_tmp_GT:0.0637
Iteration: 00161810/00170000 content_loss:0.0000 lap_loss:3418.7014 rec_loss:20.3584 style_loss:232.5759 loss_tmp:42.6371 loss_tmp_GT:0.0911
Iteration: 00161820/00170000 content_loss:0.0000 lap_loss:418235104.0000 rec_loss:214.8812 style_loss:9325154.0000 loss_tmp:4957.1558 loss_tmp_GT:0.0785
Iteration: 00161830/00170000 content_loss:0.0000 lap_loss:9133.0684 rec_loss:72.6381 style_loss:698.3317 loss_tmp:58.0809 loss_tmp_GT:0.0579
Iteration: 00161840/00170000 content_loss:0.0000 lap_loss:9114.7314 rec_loss:48.7064 style_loss:624.2943 loss_tmp:56.9594 loss_tmp_GT:0.0858
Iteration: 00161850/00170000 content_loss:0.0000 lap_loss:16554.5078 rec_loss:104.8364 style_loss:1542.5042 loss_tmp:88.9630 loss_tmp_GT:0.0712
Iteration: 00161860/00170000 content_loss:0.0000 lap_loss:10247.7246 rec_loss:65.9900 style_loss:1027.9641 loss_tmp:96.4846 loss_tmp_GT:0.0727
Iteration: 00161870/00170000 content_loss:0.0000 lap_loss:19196.0527 rec_loss:77.2881 style_loss:1428.4135 loss_tmp:125.3436 loss_tmp_GT:0.0677
Iteration: 00161880/00170000 content_loss:0.0000 lap_loss:216289.6719 rec_loss:98.5644 style_loss:14655.6758 loss_tmp:218.4098 loss_tmp_GT:0.0702
Iteration: 00161890/00170000 content_loss:0.0000 lap_loss:19604.2520 rec_loss:50.9366 style_loss:1325.8600 loss_tmp:95.0826 loss_tmp_GT:0.0942
Iteration: 00161900/00170000 content_loss:0.0000 lap_loss:93659.1016 rec_loss:297.3892 style_loss:5191.9561 loss_tmp:227.0423 loss_tmp_GT:0.0611
Iteration: 00161910/00170000 content_loss:0.0000 lap_loss:86273.3594 rec_loss:174.2626 style_loss:4537.2666 loss_tmp:169.1176 loss_tmp_GT:0.0884
Iteration: 00161920/00170000 content_loss:0.0000 lap_loss:100730.4844 rec_loss:231.1616 style_loss:8772.8340 loss_tmp:295.8207 loss_tmp_GT:0.0779
Iteration: 00161930/00170000 content_loss:0.0000 lap_loss:389786.8125 rec_loss:618.0142 style_loss:26791.2461 loss_tmp:366.0463 loss_tmp_GT:0.0742
Iteration: 00161940/00170000 content_loss:0.0000 lap_loss:15906467840.0000 rec_loss:14955.5361 style_loss:303860064.0000 loss_tmp:72561.6250 loss_tmp_GT:0.0740
The nvidia-smi
gives the info below:
Tue Feb 13 11:46:41 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:5E:00.0 On | N/A |
| 51% 38C P2 103W / 350W | 13277MiB / 24576MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:AF:00.0 Off | N/A |
| 0% 39C P8 15W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 16179 G /usr/lib/xorg/Xorg 153MiB |
| 0 N/A N/A 610515 C python 13120MiB |
| 1 N/A N/A 16179 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Should I give up training?