DexiNed icon indicating copy to clipboard operation
DexiNed copied to clipboard

Pytorch Train error ; Cuda error: device-side assert triggered

Open smoothumut opened this issue 4 years ago • 4 comments

Hi, thanks for your great work I have one qpu and get this error in the middle of the training. What could be the reason, thanks in advance for your help and your project

Sun Jan 31 00:02:48 2021 Epoch: 8 Sample 7025/7200 Loss: 352734.875 Sun Jan 31 00:02:49 2021 Epoch: 8 Sample 7030/7200 Loss: 278105.09375 Sun Jan 31 00:02:51 2021 Epoch: 8 Sample 7035/7200 Loss: 302418.0 libpng error: bad adaptive filter value Sun Jan 31 00:02:53 2021 Epoch: 8 Sample 7040/7200 Loss: 293290.03125 Sun Jan 31 00:02:55 2021 Epoch: 8 Sample 7045/7200 Loss: 288517.75 Sun Jan 31 00:02:56 2021 Epoch: 8 Sample 7050/7200 Loss: 316305.53125 C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed. C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/Loss.cu:102: block: [237,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed. Traceback (most recent call last): File "c:/Users/xx/Documents/Projects/DexiNed/main.py", line 359, in main(args) File "c:/Users/xx/Documents/Projects/DexiNed/main.py", line 336, in main train_one_epoch(epoch, File "c:/Users/xx/Documents/Projects/DexiNed/main.py", line 36, in train_one_epoch loss = sum([criterion(preds, labels,l_w) for preds, l_w in zip(preds_list,l_weight)]) File "c:/Users/xx/Documents/Projects/DexiNed/main.py", line 36, in loss = sum([criterion(preds, labels,l_w) for preds, l_w in zip(preds_list,l_weight)]) File "c:\Users\xx\Documents\Projects\DexiNed\losses.py", line 85, in bdcn_loss weight.masked_scatter_(targets > 0., RuntimeError: CUDA error: device-side assert triggered

smoothumut avatar Jan 31 '21 05:01 smoothumut

any idea??

smoothumut avatar Feb 04 '21 07:02 smoothumut

any idea??

Hi @smoothumut and sorry for my late response. But I am afraid I cannot help you. In the coming weeks I will be updating the repo and the BIPED dataset hope this solve the problem

xavysp avatar Feb 20 '21 18:02 xavysp

Hi, could you check now?

xavysp avatar Apr 29 '21 22:04 xavysp

@xavysp I am facing the same problem on colab is there any way around

Arnab1181412 avatar Jan 23 '24 17:01 Arnab1181412