da-sac icon indicating copy to clipboard operation
da-sac copied to clipboard

RuntimeError: CUDA error

Open kandysoso opened this issue 1 year ago • 0 comments

Try to train base model on GTAV, this error occured everytime on the 3rd or 4th(if I turned down the batchsize) epoch. Tracing back to the same line. We need some help here ^-^

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/disk1/hl/anaconda3/envs/da-sac/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/disk1/hl/da-sac/train.py", line 518, in main_worker score = time_call(trainer.validation, "Validation / {} / Val: ".format(val_set),
File "/disk1/hl/da-sac/train.py", line 498, in time_call val = func(*args, **kwargs) File "/disk1/hl/da-sac/train.py", line 378, in validation masks_all = eval_batch(batch) File "/disk1/hl/da-sac/train.py", line 358, in eval_batch loss, masks = step_func(epoch, batch, train=False, visualise=False) File "/disk1/hl/da-sac/train.py", line 151, in step losses_ret[key] = val.mean().item() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

kandysoso avatar Oct 04 '22 05:10 kandysoso