incubator-pegasus
incubator-pegasus copied to clipboard
coredump on on_append_log_completed
Bug Report
Please answer these questions before submitting your issue. Thanks!
- What did you do? If possible, provide a recipe for reproducing the error.
- Compaction writes per second: 4GB.
- Max flushed writes per second 1GB.
- MultiSet CU = 600K
- MultiSet QPS = 40K.
- What did you expect to see?
No coredump.
- What did you see instead?
Coredump stack:
#0 0x00007f98eb83b1d7 in raise () from /lib64/libc.so.6
#1 0x00007f98eb83c8c8 in abort () from /lib64/libc.so.6
#2 0x00007f98ef7cb7be in dsn_coredump () at /home/wutao1/pegasus-release/rdsn/src/core/core/service_api_c.cpp:76
#3 0x00007f98ef6d5bde in dsn::replication::replica_stub::handle_log_failure (this=<optimized out>, err=...)
at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_stub.cpp:1919
#4 0x00007f98ef684215 in dsn::replication::replica::on_append_log_completed (this=0x389c300, mu=..., err=..., size=<optimized out>)
at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_2pc.cpp:455
#5 0x00007f98ef7dfd08 in operator() (__args#1=<optimized out>, __args#0=..., this=<optimized out>) at /home/wutao1/app/include/c++/4.8.2/functional:2464
#6 dsn::aio_task::exec (this=<optimized out>) at /home/wutao1/pegasus-release/rdsn/include/dsn/tool-api/task.h:600
#7 0x00007f98ef7dd8a9 in dsn::task::exec_internal (this=this@entry=0xb2ed0acd7) at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#8 0x00007f98ef7f1a6d in dsn::task_worker::loop (this=0x2e3f8c0) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#9 0x00007f98ef7f1c39 in dsn::task_worker::run_internal (this=0x2e3f8c0) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#10 0x00007f98ec193600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#11 0x00007f98eccf8dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f98eb8fd73d in clone () from /lib64/libc.so.6
The mutation to append:
_appro_data_bytes = 1265,
The error:
$5 = {
_internal_code = 20 // ERR_FILE_OPERATION_FAILED
}
- What version of Pegasus are you using?
1.12.1
Related codes:
void replica_stub::handle_log_failure(error_code err)
{
derror("handle log failure: %s", err.to_string());
if (!s_not_exit_on_log_failure) {
dassert(false, "TODO: better log failure handling ...");
}
}
void replica::on_append_log_completed(mutation_ptr &mu, error_code err, size_t size)
{
if (err == ERR_OK) {
mu->set_logged();
} else {
derror("%s: append shared log failed for mutation %s, err = %s",
name(),
mu->name(),
err.to_string());
}
...
if (err != ERR_OK) {
// mutation log failure, propagate to all replicas
_stub->handle_log_failure(err);
}
mu->log_task() = _stub->_log->append(mu,
LPC_WRITE_REPLICATION_LOG,
&_tracker,
std::bind(&replica::on_append_log_completed,
this,
mu,
std::placeholders::_1,
std::placeholders::_2),
get_gpid().thread_hash(),
&pending_size);
Has the bug been fixed?