PaddleDetection icon indicating copy to clipboard operation
PaddleDetection copied to clipboard

/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [5],CUDA error(719), unspecified launch failure.

Open monkeycc opened this issue 3 years ago • 2 comments

问题确认 Search before asking

  • [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

win11 anaconda python 3.7

paddledet 2.4.0 paddlepaddle-gpu 2.3.1.post112

模型 solov2_r50_fpn_1x_coco.yml

W0726 23:22:00.142284 13396 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.3, Runtime API Version: 11.2
W0726 23:22:00.157871 13396 gpu_context.cc:306] device: 0, cuDNN Version: 8.2.
[07/26 23:22:01] ppdet.utils.download INFO: Downloading ResNet50_cos_pretrained.pdparams from https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 92063/92063 [00:14<00:00, 6295.36KB/s]
[07/26 23:22:17] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\aaa/.cache/paddle/weights\ResNet50_cos_pretrained.pdparams
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [5].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [5].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [6].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [6].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [6].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [5].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [7].
Error: ../paddle/phi/kernels/gpu/one_hot_kernel.cu:38 Assertion `p_in_data[idx] >= 0 && p_in_data[idx] < depth` failed. Illegal index value, Input(input) value should be greater than or equal to 0, and less than depth [2], but received [6].
Traceback (most recent call last):
  File "tools/train.py", line 177, in <module>
    main()
  File "tools/train.py", line 173, in main
    run(FLAGS, cfg)
  File "tools/train.py", line 127, in run
    trainer.train(FLAGS.eval)
  File "D:\PaddleDetection\ppdet\engine\trainer.py", line 407, in train
    outputs = model(data)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "D:\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 54, in forward
    out = self.get_loss()
  File "D:\PaddleDetection\ppdet\modeling\architectures\solov2.py", line 94, in get_loss
    gt_ins_labels, gt_cate_labels, gt_grid_orders, fg_num)
  File "D:\PaddleDetection\ppdet\modeling\heads\solov2_head.py", line 405, in get_loss
    ins_pred_list, ins_labels, flatten_cate_preds, cate_labels, num_ins)
  File "D:\PaddleDetection\ppdet\modeling\losses\solov2_loss.py", line 99, in __call__
    alpha=self.focal_loss_alpha)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\nn\functional\loss.py", line 2048, in sigmoid_focal_loss
    alpha = fluid.dygraph.base.to_variable([alpha], dtype=loss.dtype)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args, **kwargs)
  File "D:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\base.py", line 768, in to_variable
    name=name if name else '')
OSError: (External) CUDA error(719), unspecified launch failure.
  [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)

如果是数据问题 也没提示是哪张 不然我直接删除就好了

有没工具 或者 代码哪里修改 才能显示 有问题的数据 print 图片或者json地址

如果是cuda问题 怎么弄

monkeycc avatar Jul 26 '22 15:07 monkeycc

数据问题,类别没有正确设置,超出了设定的类别数

wangxinxin08 avatar Jul 27 '22 02:07 wangxinxin08

那就是官方转换工具的问题

python tools/x2coco.py --dataset_type labelme --json_input_dir  E:/biaozhuCOCO/JSON   --image_input_dir  E:/biaozhuCOCO/IMG   --output_dir  E:/biaozhuCOCO/COCO   --train_proportion 0.7  --val_proportion 0.2  --test_proportion 0.1

针对这个转换COCO有问题情况 应该怎么解决 @wangxinxin08

monkeycc avatar Jul 27 '22 07:07 monkeycc

https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/datasets/coco_instance.yml#L2 需要检查下这里的配置是否和实际数据集一致

jerrywgz avatar Aug 16 '22 08:08 jerrywgz