PaddleFleetX icon indicating copy to clipboard operation
PaddleFleetX copied to clipboard

Fleet ps on paddlecloud coredump when stop

Open anpark opened this issue 5 years ago • 3 comments

env: paddle 1.5.1 with fleet, 1 ps, 2 trainers, use dataset

ps and worker0 stop success, but worker1 coredump trainer failed, exit_code=134 pure virtual method called terminate called without an active exception pure virtual method called terminate called recursively ./thirdparty/paddle_cpu/bin/python: line 11: 2727 Aborted (core dumped) $SCRIPTPATH/python "$@" *error messages

anpark avatar Aug 31 '19 04:08 anpark

请问使用的paddle版本是多少?

seiriosPlus avatar Sep 05 '19 07:09 seiriosPlus

@seiriosPlus 1.5.1

anpark avatar Oct 08 '19 12:10 anpark

堆栈信息 #0 0x00007f00f3f6ccbb in paddle::memory::allocation::Allocator::FreeImpl(paddle::memory::allocation::Allocation*) () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so (gdb) bt #0 0x00007f00f3f6ccbb in paddle::memory::allocation::Allocator::FreeImpl(paddle::memory::allocation::Allocation*) () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #1 0x00007f00f1f523c9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #2 0x00007f00f1f53308 in paddle::framework::Variable::PlaceholderImplpaddle::framework::LoDTensor::~PlaceholderImpl() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #3 0x00007f00f3f0355d in paddle::framework::Scope::~Scope() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #4 0x00007f00f2180fd4 in paddle::operators::distributed::Communicator::~Communicator() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #5 0x00007f00f2094bfa in std::_Sp_counted_ptr<paddle::operators::distributed::Communicator*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #6 0x00007f00f1f523c9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() () from /home//tools/paddle_release_home/paddle_gpu/lib/python2.7/site-packages/paddle/fluid/core_avx.so #7 0x00007f015bb12eb9 in __run_exit_handlers () from /opt/compiler/gcc-4.8.2/lib/libc.so.6 #8 0x00007f015bb12f05 in exit () from /opt/compiler/gcc-4.8.2/lib/libc.so.6 #9 0x00007f015bafcbdc in __libc_start_main () from /opt/compiler/gcc-4.8.2/lib/libc.so.6 #10 0x00000000004007a1 in _start ()

anpark avatar Oct 08 '19 12:10 anpark