Paddle3D icon indicating copy to clipboard operation
Paddle3D copied to clipboard

训练时候出现报错

Open weihongwei-zg opened this issue 3 years ago • 4 comments

版本:smoke + hrnet,训练过程中(dla是正常运行的,单纯使用hrnet配置) 环境:Ubuntu paddle:2.2.2 报错如下 W0901 14:51:33.908478 351 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 90.0, Driver API Version: 50013.0, Runtime API Version: 50013.0 W0901 14:51:33.908546 351 device_context.cc:460] device: 0, MIOpen Version: 2.15.1 /usr/local/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations if data.dtype == np.object: /usr/local/lib/python3.7/site-packages/paddle/nn/layer/norm.py:653: UserWarning: When training, we now always track global mean and variance. "When training, we now always track global mean and variance.") Invalid address access: 0x7f8968e02000, Error code: 1.


C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Process abort signal is detected by the operating system. [TimeInfo: *** Aborted at 1662015117 (unix time) try "date -d @1662015117" if you are using GNU date ***] [SignalInfo: *** SIGABRT (@0x15f) received by PID 351 (TID 0x7f8aedf50700) from PID 351 ***]

train_sample.sh: line 26: 351 Aborted (core dumped) python tools/train.py --config configs/smoke/smoke_hrnet18_no_dcn_kitti.yml --iters 100 --log_interval 10 --save_interval 50

weihongwei-zg avatar Sep 01 '22 06:09 weihongwei-zg

@wobushihuair 从报错堆栈来看,应该不是Paddle的问题,看看是否系统有其他报错?

nepeplwu avatar Sep 01 '22 09:09 nepeplwu

无,只有这个报错

weihongwei-zg avatar Sep 01 '22 10:09 weihongwei-zg

可以试试在nv GPU上试试看相同命令是否可以正常运行,如果可以的话,说明该模型在DCU上可能存在适配问题

nepeplwu avatar Sep 01 '22 12:09 nepeplwu

@wobushihuair 楼主我也遇到了类似的问题,请问你最后找到解决办法了吗?

RuotongWANG avatar Sep 14 '22 15:09 RuotongWANG

该issue较长时间无反馈,先关闭,如有问题请重新打开或者建立新的issue

nepeplwu avatar Feb 01 '24 06:02 nepeplwu