incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

Replica server crashed when handle multiget request

Open zhangyifan27 opened this issue 3 years ago • 0 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.

When rolling update a cluster from 2.0-write-optim to 2.1.1 we see one of replica server crashed. The crashed server is task 19, that means we have already rolling update 19 servers successfully, but before rolling this server, it crashed.

  1. What did you expect to see? no crash.

  2. What did you see instead?

Coredump stack:

(gdb) bt
#0  0x00000000b16a6700 in ?? ()
#1  0x000000000059b44f in rocksdb::DBImpl::MultiGet (this=0x1a641b400, read_options=..., column_family=..., keys=..., values=0x7f77a7a068a0)
    at /home/wutao1/pegasus-release/rocksdb/db/db_impl/db_impl.cc:1742
#2  0x00000000005a79e1 in rocksdb::DB::MultiGet (this=<optimized out>, options=..., keys=..., values=0x7f77a7a068a0) at /home/wutao1/pegasus-release/rocksdb/include/rocksdb/db.h:450
#3  0x0000000000520136 in pegasus::server::pegasus_server_impl::on_multi_get (this=0x5bead00, request=..., reply=...)
    at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:611
#4  0x000000000050030f in bool dsn::replication::storage_serverlet<dsn::apps::rrdb_service>::register_async_rpc_handler<dsn::apps::multi_get_request, dsn::apps::multi_get_response>(dsn::task_code, char const*, void (*)(dsn::apps::rrdb_service*, dsn::apps::multi_get_request const&, dsn::rpc_replier<dsn::apps::multi_get_response>&))::{lambda(dsn::apps::rrdb_service*, dsn::message_ex*)#1}::operator()(dsn::apps::rrdb_service*, dsn::message_ex*) const () at /home/wutao1/pegasus-release/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:29
#5  0x0000000000525b0c in operator() (__args#1=0x14cc4ea018, __args#0=0x5bead00, this=<optimized out>) at /home/wutao1/app/include/c++/4.8.2/functional:2464
#6  handle_request (request=0x14cc4ea018, this=0x5bead00) at /home/wutao1/pegasus-release/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:80
#7  dsn::apps::rrdb_service::on_request (this=0x5bead00, request=0x14cc4ea018) at /home/wutao1/pegasus-release/src/include/rrdb/rrdb.server.h:17
#8  0x00007f77e24d2f82 in dsn::replication::replica::on_client_read (this=0xb56fce680, request=request@entry=0x14cc4ea018)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica.cpp:186
#9  0x00007f77e2544a9f in dsn::replication::replica_stub::on_client_read (this=0x3554600, id=..., request=0x14cc4ea018)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_stub.cpp:807
#10 0x00007f77e266c219 in dsn::task::exec_internal (this=this@entry=0x14cc4ea1b0) at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#11 0x00007f77e268046d in dsn::task_worker::loop (this=0x3a00c60) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#12 0x00007f77e2680639 in dsn::task_worker::run_internal (this=0x3a00c60) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#13 0x00007f77def4e600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#14 0x00007f77dfa60dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f77de6b873d in clone () from /lib64/libc.so.6
  1. What version of Pegasus are you using? pegasus-server-2.0-write-optim-984936d-glibc2.12-release

zhangyifan27 avatar Apr 01 '21 02:04 zhangyifan27