sunets icon indicating copy to clipboard operation
sunets copied to clipboard

CUDA error: device-side assert triggered

Open Inosonnia opened this issue 5 years ago • 2 comments

After generating the mask by encode_segmap, SpatialClassNLLCriterion.cu reports: Assertion t >= 0 && t < n_classes failed. It seems that the labels exceed the range of n_classes. I used the default data mentioned in the src code. Thank you.

Inosonnia avatar Nov 27 '18 02:11 Inosonnia

@Inosonnia Can you please provide me with more details such as the value of labels which seemed to have exceeded n_classes ?

shahsohil avatar Nov 27 '18 11:11 shahsohil

Since I used the default seg data mentioned in the src code (i.e., VOC12+benchmark_RELEASE), the clses are indeed fall in [0, 20], as I tried to print them out. It seems that the cls info of data is ok (the mask generation period is fine), and this error occurs even I change n_classes to a big value (i.e., num_classes = 1000, in models/pretrained/sunet.py).

The detailed info is listed here:

pytorch/aten/src/THC/generated/../THCTensorMathCompare.cuh line=82 error=59 : device-side assert triggered pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [768,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "train_seg.py", line 340, in train(args) File "train_seg.py", line 146, in train trainmodel(model, optimizer, trainloader, epoch, scheduler, traindata) File "train_seg.py", line 226, in trainmodel model(imagesV, labelsV) File "/python/lib/python2.7/site-packages/torch/nn/modules/module.py", line 479, in call result = self.forward(*input, **kwargs) File "/python/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/python/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/python/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCTensorMathCompare.cuh:82

Inosonnia avatar Dec 10 '18 07:12 Inosonnia