da-sac
da-sac copied to clipboard
RuntimeError: CUDA error
Try to train base model on GTAV, this error occured everytime on the 3rd or 4th(if I turned down the batchsize) epoch. Tracing back to the same line. We need some help here ^-^
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/disk1/hl/anaconda3/envs/da-sac/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/disk1/hl/da-sac/train.py", line 518, in main_worker
score = time_call(trainer.validation, "Validation / {} / Val: ".format(val_set),
File "/disk1/hl/da-sac/train.py", line 498, in time_call
val = func(*args, **kwargs)
File "/disk1/hl/da-sac/train.py", line 378, in validation
masks_all = eval_batch(batch)
File "/disk1/hl/da-sac/train.py", line 358, in eval_batch
loss, masks = step_func(epoch, batch, train=False, visualise=False)
File "/disk1/hl/da-sac/train.py", line 151, in step
losses_ret[key] = val.mean().item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.