minimind
minimind copied to clipboard
求救train_pretrain.py:RuntimeError: CUDA error: device kernel image is invalid
Traceback (most recent call last):
File "/data/aigc/model_train/minimind/train_pretrain.py", line 169, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
大佬,请求,运行训练的脚本python train_pretrain.py ,搞了一下午没看出啥问题! 系统:CentOS 7 CUDA:11.8 PyTorch:2.3.1 显卡:Tesla T4