mmskeleton RuntimeError: CUDA error: device-side assert triggered

I run this command to train st-gcn model: mmskl configs/recognition/st_gcn/dataset_example/train.yaml

Load configuration information from configs/recognition/st_gcn/dataset_example/train.yaml INFO:mmcv.runner.runner:Start running, host: ai-pose@aipose-X570-GAMING-X, work_dir: /home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/work_dir/recognition/st_gcn/custom_dataset INFO:mmcv.runner.runner:workflow: [('train', 5), ('val', 1)], max: 65 epochs /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "/home/ai-pose/anaconda3/envs/mm-test/bin/mmskl", line 7, in exec(compile(f.read(), file, 'exec')) File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/tools/mmskl", line 131, in main() File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/tools/mmskl", line 121, in main call_obj(**cfg.processor_cfg) File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/utils/importer.py", line 24, in call_obj return import_obj(type)(**kwargs) File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/processor/recognition.py", line 120, in train runner.run(data_loaders, workflow, total_epochs, loss=loss) File "/home/ai-pose/anaconda3/envs/mm-test/lib/python3.7/site-packages/mmcv/runner/runner.py", line 359, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ai-pose/anaconda3/envs/mm-test/lib/python3.7/site-packages/mmcv/runner/runner.py", line 263, in train self.model, data_batch, train_mode=True, **kwargs) File "/home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/mmskeleton/processor/recognition.py", line 135, in batch_processor log_vars = dict(loss=losses.item()) RuntimeError: CUDA error: device-side assert triggered

Mar 15 '21 11:03 MaarufB

I also get this issue, I know the reason is " The category_id will be set to -1 if the category annotations miss." https://github.com/pytorch/pytorch/issues/1204 the input for criterion should satisfy t >= 0 && t < n_classes. Maybe you can try to change the label -1 to a large number.

Apr 19 '21 16:04 zren2

in the CUSTOM_DATASET.md, I got this err by using my own datasets but not change the params of num_class: 3 in the 'mmskl configs/recognition/st_gcn/dataset_example/train.yaml's train.yaml.

also you may change the test.yaml of the default param ' num_class: 3' to your real class numbers.

Oct 15 '21 15:10 renlle

The problem is solved by change the indices of label from [1, N] to [0, N-1]. After debugging, I found error occured on the following 284. (./mmskeleton/mmskeleton/processor/recognition.py) I checked the official documentation, and knonw that all indices in range [0, C]. Successful screenshot:

Nov 06 '22 07:11 zzy0222

mmskeleton mmskeleton copied to clipboard

RuntimeError: CUDA error: device-side assert triggered

mmskeleton
mmskeleton copied to clipboard