训练时候出现报错
版本:smoke + hrnet,训练过程中(dla是正常运行的,单纯使用hrnet配置)
环境:Ubuntu
paddle:2.2.2
报错如下
W0901 14:51:33.908478 351 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 90.0, Driver API Version: 50013.0, Runtime API Version: 50013.0
W0901 14:51:33.908546 351 device_context.cc:460] device: 0, MIOpen Version: 2.15.1
/usr/local/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if data.dtype == np.object:
/usr/local/lib/python3.7/site-packages/paddle/nn/layer/norm.py:653: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
Invalid address access: 0x7f8968e02000, Error code: 1.
C++ Traceback (most recent call last):
No stack trace in paddle, may be caused by external reasons.
Error Message Summary:
FatalError: Process abort signal is detected by the operating system.
[TimeInfo: *** Aborted at 1662015117 (unix time) try "date -d @1662015117" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x15f) received by PID 351 (TID 0x7f8aedf50700) from PID 351 ***]
train_sample.sh: line 26: 351 Aborted (core dumped) python tools/train.py --config configs/smoke/smoke_hrnet18_no_dcn_kitti.yml --iters 100 --log_interval 10 --save_interval 50
@wobushihuair 从报错堆栈来看,应该不是Paddle的问题,看看是否系统有其他报错?
无,只有这个报错
可以试试在nv GPU上试试看相同命令是否可以正常运行,如果可以的话,说明该模型在DCU上可能存在适配问题
@wobushihuair 楼主我也遇到了类似的问题,请问你最后找到解决办法了吗?
该issue较长时间无反馈,先关闭,如有问题请重新打开或者建立新的issue