pyskl RuntimeError: CUDA error: unspecified launch failure

RuntimeError: CUDA error: unspecified launch failure

Open tribeband opened this issue 1 year ago • 0 comments

Traceback (most recent call last): File "/home/ps/ZW/pyskl/tools/train.py", line 177, in main() File "/home/ps/ZW/pyskl/tools/train.py", line 169, in main train_model(model, datasets, cfg, validate=args.validate, test=test_option, timestamp=timestamp, meta=meta) File "/home/ps/ZW/pyskl/pyskl/apis/train.py", line 153, in train_model runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train self.call_hook('after_train_iter') File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook getattr(hook, fn_name)(self) File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/mmcv/runner/hooks/optimizer.py", line 59, in after_train_iter grad_norm = self.clip_grads(runner.model.parameters()) File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/mmcv/runner/hooks/optimizer.py", line 50, in clip_grads return clip_grad.clip_grad_norm_(params, **self.grad_clip) File "/home/ps/anaconda3/envs/pyskl/lib/python3.10/site-packages/torch/nn/utils/clip_grad.py", line 76, in clip_grad_norm_ torch.foreach_mul(grads, clip_coef_clamped.to(device)) # type: ignore[call-overload] RuntimeError: CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

It seems that the training corrupted brutally and I am not able to locate why. please help

Jun 02 '23 02:06 tribeband

pyskl pyskl copied to clipboard

RuntimeError: CUDA error: unspecified launch failure

pyskl
pyskl copied to clipboard