incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

Server crashed after restarting

Open zhangyifan27 opened this issue 3 years ago • 1 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.

Replica server restarted and crashed after a downtime of the cloud host.

  1. What did you expect to see?

Restart normally.

  1. What did you see instead?

Coredump stack:

Program terminated with signal 6, Aborted.
#0  0x00007f25689a51d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) #0  0x00007f25689a51d7 in raise () from /lib64/libc.so.6
#1  0x00007f25689a68c8 in abort () from /lib64/libc.so.6
#2  0x00007f256c4e49fe in dsn_coredump ()
    at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:76
#3  0x00000000006e9284 in pegasus::server::pegasus_server_impl::cancel_background_work (this=0x19b320800, wait=<optimized out>)
    at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:1615
#4  0x00007f256c3d0a37 in dsn::replication::replica::close (this=0x4420ff80)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica.cpp:394
#5  0x00007f256c42ef6c in dsn::replication::replica_stub::close_replica (
    this=0x3236580, r=...)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_stub.cpp:1872
#6  0x00007f256c42f1c4 in operator() (__closure=<optimized out>)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_stub.cpp:1854
#7  std::_Function_handler<void(), dsn::replication::replica_stub::begin_close_replica(dsn::replication::replica_ptr)::__lambda30>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/wutao1/app/include/c++/4.8.2/functional:2071
#8  0x00007f256c4f6cd9 in dsn::task::exec_internal (this=this@entry=0xc558a2f)
    at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#9  0x00007f256c50aa6d in dsn::task_worker::loop (this=0x2ec9550)
    at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#10 0x00007f256c50ac39 in dsn::task_worker::run_internal (this=0x2ec9550)
    at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#11 0x00007f25692fd600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#12 0x00007f2569f70dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f2568a6773d in clone () from /lib64/libc.so.6
(gdb) quit
  1. What version of Pegasus are you using? pegasus-server-1.11.6-9f4e5ae-glibc2.12-release

zhangyifan27 avatar Apr 01 '21 03:04 zhangyifan27

https://github.com/apache/incubator-pegasus/blob/v1.11.6/src/server/pegasus_server_impl.cpp#L1613-L1617

void pegasus_server_impl::cancel_background_work(bool wait)
{
    dassert(_db != nullptr, ""); // cash here
    rocksdb::CancelAllBackgroundWork(_db, wait);
}

foreverneverer avatar Apr 01 '21 03:04 foreverneverer