pysot
pysot copied to clipboard
CUDA error: device-side assert triggered at
When I run train.py in folder 'siamrpn_alex_dwxcorr_otb', there is an error——’RuntimeError: cuda runtime error (710) : device-side assert triggered at C:/w/b/windows/pytorch/aten/src\THCUNN/generic/ClassNLLCriterion.cu:115‘ . Does anyone know how to solve it?
Currently it does not support Windows
When I run train.py on Ubuntu, I also meet this problem,The error message is as follows:
/opt/conda/conda-bld/pytorch_1535493744281/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [4,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize
failed.
...
...
...
Traceback (most recent call last):
File "tools/train.py", line 346, in
If someone can help me, I will be very grateful!
我也是 在ubuntu系统的错误
File "/home/xxc/project/GeekPlusA-ai-pysot-master/pysot/pysot/models/loss.py", line 16, in get_cls_loss pred = torch.index_select(pred, 0, select)
RuntimeError: CUDA error: device-side assert triggered
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 22772) of binary: /home/xxc/miniconda3/envs/pysot/bin/python3 ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
Currently it does not support Windows
I can run it on Win11 with Cuda or on CPU in a conda environment:
GPU Case
first check your CUDA version, my is e.g.: 11.6
nvcc --version
Next I updated my Conda environment by:
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
The following packages will be UPDATED: ca-certificates pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-h5b45459_0 None certifi pkgs/main/win-64::certifi-2022.9.24-p~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0 None pytorch 0.4.1-py37_cuda90_cudnn7he774522_1 --> 1.12.0-py3.7_cuda11.6_cudnn8_0 None torchvision pytorch/noarch::torchvision-0.2.1-py_2 --> pytorch/win-64::torchvision-0.13.0-py37_cu116 None
That's it. The model will run on GPU.
Tested with:
python tools/demo.py --config experiments/siamrpn_mobilev2_l234_dwxcorr/config.yaml --snapshot experiments/siamrpn_mobilev2_l234_dwxcorr/model.pth --video demo/bag.avi
CPU Case
If you want to run it on CPU only, you do not need to update packages.
Just go to demo.py and insert the following line on top of main()
cfg.CUDA = False
e.g.
def main():
# load config
cfg.CUDA = False
cfg.merge_from_file(args.config)
...