PaddleDetection
PaddleDetection copied to clipboard
ValueError: Target 460 is out of upper bound.
问题确认 Search before asking
- [X] 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.
请提出你的问题 Please ask your question
ppyoloe_crn_s_300e_coco VOC 数据集
python tools/train.py -c configs/ppyoloe/ppyoloe_crn_s_300e_coco.yml
W0823 14:30:26.446256 4452 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2
W0823 14:30:26.461884 4452 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[08/23 14:30:27] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\fujunnnn/.cache/paddle/weights\CSPResNetb_s_pretrained.pdparams
[08/23 14:30:30] ppdet.engine INFO: Epoch: [0] [ 0/339] learning_rate: 0.000000 loss: 1931307253760.000000 loss_cls: 0.594841 loss_iou: 772522901504.000000 loss_dfl: 5885.125977 loss_l1: 0.105123 eta: 4 days, 9:30:32 batch_cost: 3.7348 data_cost: 0.2500 ips: 2.6775 images/s
Traceback (most recent call last):
File "tools/train.py", line 177, in <module>
main()
File "tools/train.py", line 173, in main
run(FLAGS, cfg)
File "tools/train.py", line 127, in run
trainer.train(FLAGS.eval)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\engine\trainer.py", line 454, in train
outputs = model(data)
File "E:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "E:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 59, in forward
out = self.get_loss()
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 125, in get_loss
return self._forward()
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 88, in _forward
yolo_losses = self.yolo_head(neck_feats, self.inputs)
File "E:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "E:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 217, in forward
return self.forward_train(feats, targets)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 160, in forward_train
], targets)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 355, in get_loss
assigned_scores_sum)
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 291, in _bbox_loss
assigned_ltrb_pos) * bbox_weight
File "E:\PaddleX_GUI_2.1.0_win10\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 256, in _df_loss
pred_dist, target_left, reduction='none') * weight_left
File "E:\anaconda3\envs\PaddleDetection\lib\site-packages\paddle\nn\functional\loss.py", line 1723, in cross_entropy
label_max.item()))
ValueError: Target 25479 is out of upper bound.
python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml
W0823 21:31:38.730271 10200 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2
W0823 21:31:38.750262 10200 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.0.bias is unmatched with the shape [4] in model yolo_head.pred_cls.0.bias. And the weight yolo_head.pred_cls.0.bias will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365, 384, 3, 3] in pretrained weight yolo_head.pred_cls.0.weight is unmatched with the shape [4, 384, 3, 3] in model yolo_head.pred_cls.0.weight. And the weight yolo_head.pred_cls.0.weight will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.1.bias is unmatched with the shape [4] in model yolo_head.pred_cls.1.bias. And the weight yolo_head.pred_cls.1.bias will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365, 192, 3, 3] in pretrained weight yolo_head.pred_cls.1.weight is unmatched with the shape [4, 192, 3, 3] in model yolo_head.pred_cls.1.weight. And the weight yolo_head.pred_cls.1.weight will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.2.bias is unmatched with the shape [4] in model yolo_head.pred_cls.2.bias. And the weight yolo_head.pred_cls.2.bias will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: The shape [365, 96, 3, 3] in pretrained weight yolo_head.pred_cls.2.weight is unmatched with the shape [4, 96, 3, 3] in model yolo_head.pred_cls.2.weight. And the weight yolo_head.pred_cls.2.weight will not be loaded
[08/23 21:31:40] ppdet.utils.checkpoint INFO: Finish loading model weights: C:\Users\MM/.cache/paddle/weights\ppyoloe_crn_s_obj365_pretrained.pdparams
Traceback (most recent call last):
File "tools/train.py", line 172, in <module>
main()
File "tools/train.py", line 168, in main
run(FLAGS, cfg)
File "tools/train.py", line 132, in run
trainer.train(FLAGS.eval)
File "D:\0SDXX\PaddleDetection\ppdet\engine\trainer.py", line 504, in train
outputs = model(data)
File "D:\Anaconda3\envs\PaddleSeg\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "D:\Anaconda3\envs\PaddleSeg\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "D:\0SDXX\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 59, in forward
out = self.get_loss()
File "D:\0SDXX\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 124, in get_loss
return self._forward()
File "D:\0SDXX\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 88, in _forward
yolo_losses = self.yolo_head(neck_feats, self.inputs)
File "D:\Anaconda3\envs\PaddleSeg\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "D:\Anaconda3\envs\PaddleSeg\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "D:\0SDXX\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 216, in forward
return self.forward_train(feats, targets)
File "D:\0SDXX\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 161, in forward_train
], targets)
File "D:\0SDXX\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 354, in get_loss
assigned_scores_sum)
File "D:\0SDXX\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 290, in _bbox_loss
assigned_ltrb_pos) * bbox_weight
File "D:\0SDXX\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 255, in _df_loss
pred_dist, target_left, reduction='none') * weight_left
File "D:\Anaconda3\envs\PaddleSeg\lib\site-packages\paddle\nn\functional\loss.py", line 1723, in cross_entropy
label_max.item()))
ValueError: Target 28 is out of upper bound.
有试过其他模型嘛 有这个问题嘛
有修改过什么配置嘛?
win上请换用paddle2.2.2,高版本暂时有bug会尽快修。linux上版本没问题。
win上请换用paddle2.2.2,高版本暂时有bug会尽快修。linux上版本没问题。
如何安装带gpu的2.2.2版本?有说明吗?
@ionescofung https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/windows-pip.html 安装参考这个,安装命令中的版本设置为paddlepaddle-gpu==2.2.2就可以了
我使用develop版本还是会报同样的错误,实在是不想退回2.2.2,因为不支持cuda11.6,还得安装11.2
我使用develop版本还是会报同样的错误,实在是不想退回2.2.2,因为不支持cuda11.6,还得安装11.2
@lazyn1997 我们这里develop测试是正常的
和训练的网络有关系吗,我用的ppyoloe
你是单卡训练的嘛?
是的,单卡训练
是的,单卡训练
你PaddleDetection版本是多少?
是的,单卡训练
你PaddleDetection版本是多少?
release/2.5
你看一下你那边的代码有没有这一行:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/ppdet/modeling/heads/ppyoloe_head.py#L350
有这一句
你拉取最新的代码再跑一下,print一下assigned_scores_sum
debug看看
或者你可以提供给我你的环境嘛?我这边本地实在无法复现这个问题
Package Version
astor 0.8.1 attrs 22.1.0 Babel 2.11.0 bce-python-sdk 0.8.74 certifi 2022.9.24 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 cycler 0.11.0 Cython 0.29.32 decorator 5.1.1 dill 0.3.6 exceptiongroup 1.0.4 filterpy 1.4.5 Flask 2.2.2 Flask-Babel 2.0.0 fonttools 4.25.0 future 0.18.2 idna 3.4 importlib-metadata 5.0.0 iniconfig 1.1.1 itsdangerous 2.1.2 Jinja2 3.1.2 joblib 1.2.0 kiwisolver 1.4.2 lap 0.4.0 lxml 4.9.1 MarkupSafe 2.1.1 matplotlib 3.5.2 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 motmetrics 1.2.5 multiprocess 0.70.14 munkres 1.1.4 numpy 1.21.5 opencv-python 4.6.0.66 opt-einsum 3.3.0 packaging 21.3 paddle-bfloat 0.1.7 paddledet 2.5.0 paddlepaddle-gpu 0.0.0.post116 pandas 1.3.5 Pillow 9.3.0 pip 22.2.2 pluggy 1.0.0 protobuf 3.20.0 pyclipper 1.3.0.post4 pycocotools 2.0.2 pycryptodome 3.15.0 pyparsing 3.0.9 PyQt5 5.15.7 PyQt5-Qt5 5.15.2 PyQt5-sip 12.11.0 pytest 7.2.0 pytest-timeout 2.1.0 python-dateutil 2.8.2 pytz 2022.6 PyYAML 6.0 requests 2.28.1 scikit-learn 1.0.2 scipy 1.7.3 setuptools 65.5.0 Shapely 1.8.5.post1 sip 4.19.13 six 1.16.0 sklearn 0.0 terminaltables 3.1.10 threadpoolctl 3.1.0 tomli 2.0.1 tornado 6.2 tqdm 4.64.1 typeguard 2.13.3 typing_extensions 4.3.0 urllib3 1.26.12 visualdl 2.4.1 Werkzeug 2.2.2 wheel 0.37.1 wincertstore 0.2 xmltodict 0.13.0 zipp 3.10.0
(paddle_env) PS E:\Documents\code\GitHub\PaddleDetection> python -u tools/train.py -c .\configs\ppyoloe\ppyoloe_crn_x_300e_LYC_2019_12.yml --eval W1123 21:33:49.114674 22704 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 11.6 W1123 21:33:49.118680 22704 gpu_resources.cc:91] device: 0, cuDNN Version: 8.5. [11/23 21:33:50] ppdet.utils.checkpoint INFO: Finish loading model weights: pretrain_weights/CSPResNetb_x_pretrained.pdparams [11/23 21:33:53] ppdet.engine INFO: Epoch: [0] [ 0/155] learning_rate: 0.000000 loss: -34113.761719 loss_cls: 0.155425 loss_iou: -22994.197266 loss_dfl: 46743.148438 loss_l1: 11.302123 eta: 1 day, 7:21:36 batch_cost: 2.4279 data_cost: 0.2216 ips: 1.6475 images/s
这是我的库和cuda环境,显卡移动端3080,win11系统
打印的话报错就会出现nan
Error: C:\home\workspace\Paddle\paddle\phi\kernels\gpu\bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one)
failed. Input is expected to be within the interval [0, 1], but received nan.
Error: C:\home\workspace\Paddle\paddle\phi\kernels\gpu\bce_loss_kernel.cu:42 Assertion (x >= static_cast<T>(0)) && (x <= one)
failed. Input is expected to be within the interval [0, 1], but received nan.
使用CPU是可以正常跑的
好的,我找台Windows的机器复现一下,感觉是某个算子在Windows平台下GPU kernel有问题导致的出nan
辛苦了
首先,我这边在Windows上复现了这个问题,是paddle框架的bug,paddle.masked_select这个算子在gpu下的运算是错误的。附上截图:
ppyoloe模型在计算loss的时候使用到了这个算子,导致了后续结果出nan。
其次,这个问题我是在Python3.7的环境下才能复现,在Python3.9的环境下是正常的,附上截图:
最后,这个问题我已经反馈给了Paddle框架的同学,后续会进行排期修复。为了不影响你使用,建议你试一下在Python3.9环境下安装paddle-develop版本跑ppyoloe模型的训练,给你带来的不便,我们深感抱歉~
好的感谢,我确实也是3.7版本