incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

coredump on on_append_log_completed

Open neverchanje opened this issue 6 years ago • 1 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error.
  • Compaction writes per second: 4GB.
  • Max flushed writes per second 1GB.
  • MultiSet CU = 600K
  • MultiSet QPS = 40K.
  1. What did you expect to see?

No coredump.

  1. What did you see instead?

Coredump stack:

#0  0x00007f98eb83b1d7 in raise () from /lib64/libc.so.6
#1  0x00007f98eb83c8c8 in abort () from /lib64/libc.so.6
#2  0x00007f98ef7cb7be in dsn_coredump () at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:76
#3  0x00007f98ef6d5bde in dsn::replication::replica_stub::handle_log_failure (this=<optimized out>, err=...)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_stub.cpp:1919
#4  0x00007f98ef684215 in dsn::replication::replica::on_append_log_completed (this=0x389c300, mu=..., err=..., size=<optimized out>)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_2pc.cpp:455
#5  0x00007f98ef7dfd08 in operator() (__args#1=<optimized out>, __args#0=..., this=<optimized out>) at /home/wutao1/app/include/c++/4.8.2/functional:2464
#6  dsn::aio_task::exec (this=<optimized out>) at /home/wutao1/pegasus-release/rdsn/include/dsn/tool-api/task.h:600
#7  0x00007f98ef7dd8a9 in dsn::task::exec_internal (this=this@entry=0xb2ed0acd7) at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#8  0x00007f98ef7f1a6d in dsn::task_worker::loop (this=0x2e3f8c0) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#9  0x00007f98ef7f1c39 in dsn::task_worker::run_internal (this=0x2e3f8c0) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#10 0x00007f98ec193600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#11 0x00007f98eccf8dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f98eb8fd73d in clone () from /lib64/libc.so.6

The mutation to append:

  _appro_data_bytes = 1265, 

The error:

$5 = {
  _internal_code = 20 // ERR_FILE_OPERATION_FAILED
}
  1. What version of Pegasus are you using?

1.12.1

Related codes:

void replica_stub::handle_log_failure(error_code err)
{
    derror("handle log failure: %s", err.to_string());
    if (!s_not_exit_on_log_failure) {
        dassert(false, "TODO: better log failure handling ...");
    }
}
void replica::on_append_log_completed(mutation_ptr &mu, error_code err, size_t size)
{
    if (err == ERR_OK) {
        mu->set_logged();
    } else {
        derror("%s: append shared log failed for mutation %s, err = %s",
               name(),
               mu->name(),
               err.to_string());
    }

    ...

    if (err != ERR_OK) {
        // mutation log failure, propagate to all replicas
        _stub->handle_log_failure(err);
    }
        mu->log_task() = _stub->_log->append(mu,
                                             LPC_WRITE_REPLICATION_LOG,
                                             &_tracker,
                                             std::bind(&replica::on_append_log_completed,
                                                       this,
                                                       mu,
                                                       std::placeholders::_1,
                                                       std::placeholders::_2),
                                             get_gpid().thread_hash(),
                                             &pending_size);

neverchanje avatar Jan 09 '20 16:01 neverchanje

Has the bug been fixed?

acelyc111 avatar Jul 31 '21 03:07 acelyc111