mmskeleton
mmskeleton copied to clipboard
RuntimeError: CUDA error: device-side assert triggered
I run this command to train st-gcn model: mmskl configs/recognition/st_gcn/dataset_example/train.yaml
Load configuration information from configs/recognition/st_gcn/dataset_example/train.yaml
INFO:mmcv.runner.runner:Start running, host: ai-pose@aipose-X570-GAMING-X, work_dir: /home/ai-pose/Desktop/Ma-aruf/Trials/Trial1/mmskeleton/work_dir/recognition/st_gcn/custom_dataset
INFO:mmcv.runner.runner:workflow: [('train', 5), ('val', 1)], max: 65 epochs
/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes
failed.
Traceback (most recent call last):
File "/home/ai-pose/anaconda3/envs/mm-test/bin/mmskl", line 7, in
I also get this issue, I know the reason is " The category_id will be set to -1 if the category annotations miss." https://github.com/pytorch/pytorch/issues/1204 the input for criterion should satisfy t >= 0 && t < n_classes. Maybe you can try to change the label -1 to a large number.
in the CUSTOM_DATASET.md, I got this err by using my own datasets but not change the params of num_class: 3
in the 'mmskl configs/recognition/st_gcn/dataset_example/train.yaml's train.yaml.
also you may change the test.yaml of the default param ' num_class: 3' to your real class numbers.
The problem is solved by change the indices of label from [1, N] to [0, N-1].
After debugging, I found error occured on the following 284. (./mmskeleton/mmskeleton/processor/recognition.py)
I checked the official documentation, and knonw that all indices in range [0, C].
Successful screenshot: