dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Bug: crash with blopop operation

Open Fabio3rs opened this issue 2 years ago • 5 comments

Describe the bug When multiple connections are pushing and poping to two queues, a crash occurs. The problem only happens with multiple threads pushing and poping in the queues

To Reproduce Steps to reproduce the behavior:

  1. Compile the example code with clang++-12 main.cpp -std=c++17 -lPocoFoundation -lPocoNet -lPocoRedis -lpthread
  2. Starts dragonfly with ./dragonfly or with the debugger
  3. Run the executable first with ./a.out, open a new terminal and run again with ./a.out gateway
  4. The failure will occur after one or more seconds

I did a simple diagram in gimp to try explain the example architecture image

Expected behavior Should not crash

Screenshots

F20221030 19:29:08.073808  7077 engine_shard_set.cc:145] Check failed: continuation_trans_ == nullptr BLPOP@22/1 (178) when polling BLPOP@9/1 (178)
*** Check failure stack trace: ***
    @          0x21803aa  google::LogMessage::Fail()
    @          0x217f7be  google::LogMessage::SendToLog()
    @          0x21800da  google::LogMessage::Flush()
    @          0x2183a69  google::LogMessageFatal::~LogMessageFatal()
    @          0x1bc40d2  dfly::EngineShard::PollExecution()
    @          0x1cb6ea5  dfly::Transaction::ExecuteAsync()::$_6::operator()()
    @          0x1cb626d  std::__invoke_impl<>()
    @          0x1cb60fd  _ZSt10__invoke_rIvRZN4dfly11Transaction12ExecuteAsyncEvE3$_6JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES5_E4typeEOS6_DpOS7_
    @          0x1cb5d3d  std::_Function_handler<>::_M_invoke()
    @          0x1b39908  std::function<>::operator()()
    @          0x1e9f24c  util::fibers_ext::FiberQueue::Run()
    @          0x1bcb9a5  dfly::EngineShard::EngineShard()::$_2::operator()()
    @          0x1bcb79d  _ZN5boost7context6detail6invokeIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSD_DpOT0_EE4typeEE4typeESH_SK_
    @          0x1bcb701  _ZN5boost7context6detail10apply_implIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEJEEEDTclsr5boost7context6detailE6invokeclsr3stdE7forwardIT_Efp_Espclsr3stdE3getIXT1_EEclsr3stdE7forwardIT0_Efp0_EEEEOSD_OSE_St16integer_sequenceImJXspT1_EEE
    @          0x1bcb65e  _ZN5boost7context6detail5applyIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEEEDTcl10apply_implclsr3stdE7forwardIT_Efp_Eclsr3stdE7forwardIT0_Efp0_Etl18__make_integer_seqISt16integer_sequencemXsr3std10tuple_sizeINSt5decayISE_E4typeEEE5valueEEEEEOSD_OSE_
    @          0x1bcab72  boost::fibers::worker_context<>::run_()
    @          0x1bcd4da  std::__invoke_impl<>()
    @          0x1bcd070  std::__invoke<>()
    @          0x1bccecd  _ZNSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_7context5fiberEOSD_EPSB_St12_PlaceholderILi1EEEE6__callISD_JSE_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @          0x1bccc71  std::_Bind<>::operator()<>()
    @          0x1bccb18  _ZN5boost7context6detail6invokeIRSt5_BindIFMNS_6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_5fiberEOSF_EPSE_St12_PlaceholderILi1EEEEJSF_EEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSR_DpOT0_EE4typeEE4typeESV_SY_
    @          0x1bcc8ce  boost::context::detail::fiber_record<>::run()
    @          0x1bcc03d  boost::context::detail::fiber_entry<>()
    @     0x7ffff7be01cf  make_fcontext

image

Environment (please complete the following information):

  • OS: Linux Mint 20.3
  • Kernel: # Command: Linux PC 5.15.0-52-generic #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: Bare metal
  • Dragonfly Version: Main branch git log commit fabad45d425a3b2ff7a282a3b318f85d9a764b3f (HEAD -> main, origin/main, origin/HEAD) Date: Wed Oct 26 10:36:50 2022 -0300
  • Compiler: Clang 12
  • CPU: AMD Ryzen 9 5900X

Reproducible Code Snippet

  • Using Poco library from ubuntu 20.04 repository 'sudo apt install libpoco-dev` as Redis client
  • Compile with clang++-12 main.cpp -std=c++17 -lPocoFoundation -lPocoNet -lPocoRedis -lpthread
  • This is a minimal simple code to reproduce the problem
  • Run first with ./a.out
  • Open another terminal instance and run ./a.out gateway
#include <Poco/Redis/Client.h>
#include <Poco/Redis/Command.h>
#include <Poco/Redis/Redis.h>
#include <algorithm>
#include <atomic>
#include <chrono>
#include <string_view>
#include <thread>
#include <vector>

namespace {
auto blpop(Poco::Redis::Client &cli, const std::vector<std::string> &lista,
           int64_t timeout)
    -> std::optional<std::pair<std::string, std::string>> {
    std::optional<std::pair<std::string, std::string>> redisElement;

    auto result = cli.execute<Poco::Redis::Array>(
        Poco::Redis::Command::blpop(lista, timeout));

    if (result.isNull()) {
        return redisElement;
    }

    redisElement = {result.get<Poco::Redis::BulkString>(0).value(),
                    result.get<Poco::Redis::BulkString>(1).value()};

    return redisElement;
}

auto rpush(Poco::Redis::Client &inst,
           const std::pair<std::string, std::string> &data) -> int64_t {
    return inst.execute<Poco::Int64>(
        Poco::Redis::Command::rpush(data.first, data.second));
}

std::string msgcli = R"json({
    "Code": 200,
    "MessageData": {
        "AAAAAAA": {
            "SomeData4": "123456789",
            "SomeData3": "2",
            "SomeData2": "1",
            "SomeData1": "0"
        },
        "DATA": {
            "DATA": "DATA"
        },
        "OtherDATA": {
            "DATA": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras efficitur gravida tortor, quis porttitor odio dictum sit amet. Praesent laoreet tempor mauris. In facilisis libero sed dolor venenatis porttitor in a metus. Nunc id nulla nec libero elementum feugiat tristique ac nisl. In tempor sapien eu eleifend interdum. Vestibulum pellentesque turpis non ante sollicitudin, id ullamcorper neque condimentum. Integer accumsan dui id justo feugiat, et faucibus turpis congue. Nullam sagittis est sem, non commodo orci imperdiet vitae. Suspendisse potenti. Aliquam porta et nulla eget laoreet. Fusce gravida non erat non tincidunt. Phasellus congue ullamcorper massa, eget sodales tellus bibendum cursus."
        }
    },
    "Terminal": {
        "SomeKeyData": 1641946699702949624,
        "KeyID": "00000000-0000-0000-0000-000000000000",
        "KeyPub": "nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn"
    }
})json";

std::atomic<bool> running{true};

void replyInAnotherQueue() {
    Poco::Redis::Client cli("127.0.0.1", 6379);

    while (running) {
        if (auto value = blpop(cli, {"module_queue:targetqueue"}, 10)) {
            rpush(cli, {"gateway:xyz", value->second});
        }
    }
}

void pushThread() {
    Poco::Redis::Client cli("127.0.0.1", 6379);

    for (int i = 0; i < 100000000; i++) {
        rpush(cli, {"gateway:xyz", msgcli});
    }

    std::cout << "Gateway exit" << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
    running = false;
}

void popThread() {
    Poco::Redis::Client cli("127.0.0.1", 6379);

    while (running) {
        if (auto value = blpop(cli, {"gateway:xyz"}, 10)) {
        }
    }
}

std::vector<std::thread> threads;
} // namespace

int main(int argc, char **argv) {
    auto endargs = argv + argc;
    auto is_gateway =
        std::find(argv + 1, endargs, std::string_view("gateway")) != endargs;

    if (is_gateway) {
        std::cout << "is_gateway" << std::endl;
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(popThread);
        threads.emplace_back(pushThread);
        threads.emplace_back(pushThread);
        threads.emplace_back(pushThread);
        threads.emplace_back(pushThread);
        threads.emplace_back(pushThread);
    } else {
        std::cout << "reply response" << std::endl;
        threads.emplace_back(replyInAnotherQueue);
        threads.emplace_back(replyInAnotherQueue);
        threads.emplace_back(replyInAnotherQueue);
        threads.emplace_back(replyInAnotherQueue);
        threads.emplace_back(replyInAnotherQueue);
        threads.emplace_back(replyInAnotherQueue);
    }

    for (auto &thr : threads) {
        if (thr.joinable()) {
            thr.join();
        }
    }


    running = false;
}

Another test with --alsologtostderr log:

./dragonfly --alsologtostderr
I20221030 20:44:53.516671  7164 init.cc:58] ./dragonfly running in debug mode.
I20221030 20:44:53.523373  7164 dfly_main.cc:271] Starting dragonfly df-dev-0000000
I20221030 20:44:53.523394  7164 dfly_main.cc:294] maxmemory has not been specified. Deciding myself....
I20221030 20:44:53.523463  7164 dfly_main.cc:299] Found 28.04GiB available memory. Setting maxmemory to 22.43GiB
I20221030 20:44:53.554666  7184 proactor.cc:418] IORing with 1024 entries, allocated 102720 bytes, cq_entries is 2048
I20221030 20:44:53.578753  7164 proactor_pool.cc:66] Running 24 io threads
I20221030 20:44:53.581677  7164 server_family.cc:380] Data directory is "/mnt/projects/Projects/dragonfly/build"
I20221030 20:44:53.581770  7164 server_family.cc:128] Checking "/mnt/projects/Projects/dragonfly/build/dump"
I20221030 20:44:53.581895  7167 listener_interface.cc:87] sock[76] AcceptServer - listening on port 6379
F20221030 20:44:56.603554  7183 engine_shard_set.cc:148] Check failed: committed_txid_ == trans->notify_txid() (25 vs. 19) 
*** Check failure stack trace: ***
    @          0x21803aa  google::LogMessage::Fail()
    @          0x217f7be  google::LogMessage::SendToLog()
    @          0x21800da  google::LogMessage::Flush()
    @          0x2183a69  google::LogMessageFatal::~LogMessageFatal()
    @          0x1bc4314  dfly::EngineShard::PollExecution()
    @          0x1cb6ea5  dfly::Transaction::ExecuteAsync()::$_6::operator()()
    @          0x1cb626d  std::__invoke_impl<>()
    @          0x1cb60fd  _ZSt10__invoke_rIvRZN4dfly11Transaction12ExecuteAsyncEvE3$_6JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES5_E4typeEOS6_DpOS7_
    @          0x1cb5d3d  std::_Function_handler<>::_M_invoke()
    @          0x1b39908  std::function<>::operator()()
    @          0x1e9f24c  util::fibers_ext::FiberQueue::Run()
    @          0x1bcb9a5  dfly::EngineShard::EngineShard()::$_2::operator()()
    @          0x1bcb79d  _ZN5boost7context6detail6invokeIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSD_DpOT0_EE4typeEE4typeESH_SK_
    @          0x1bcb701  _ZN5boost7context6detail10apply_implIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEJEEEDTclsr5boost7context6detailE6invokeclsr3stdE7forwardIT_Efp_Espclsr3stdE3getIXT1_EEclsr3stdE7forwardIT0_Efp0_EEEEOSD_OSE_St16integer_sequenceImJXspT1_EEE
    @          0x1bcb65e  _ZN5boost7context6detail5applyIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEEEDTcl10apply_implclsr3stdE7forwardIT_Efp_Eclsr3stdE7forwardIT0_Efp0_Etl18__make_integer_seqISt16integer_sequencemXsr3std10tuple_sizeINSt5decayISE_E4typeEEE5valueEEEEEOSD_OSE_
    @          0x1bcab72  boost::fibers::worker_context<>::run_()
    @          0x1bcd4da  std::__invoke_impl<>()
    @          0x1bcd070  std::__invoke<>()
    @          0x1bccecd  _ZNSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_7context5fiberEOSD_EPSB_St12_PlaceholderILi1EEEE6__callISD_JSE_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
    @          0x1bccc71  std::_Bind<>::operator()<>()
    @          0x1bccb18  _ZN5boost7context6detail6invokeIRSt5_BindIFMNS_6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_5fiberEOSF_EPSE_St12_PlaceholderILi1EEEEJSF_EEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSR_DpOT0_EE4typeEE4typeESV_SY_
    @          0x1bcc8ce  boost::context::detail::fiber_record<>::run()
    @          0x1bcc03d  boost::context::detail::fiber_entry<>()
    @     0x7f5a69fdd1cf  make_fcontext
*** SIGABRT received at time=1667173496 on cpu 18 ***
PC: @     0x7f5a69a9700b  (unknown)  raise
    @          0x21d8ba0         64  absl::lts_20220623::WriteFailureInfo()
    @          0x21d885d        240  absl::lts_20220623::AbslFailureSignalHandler()
    @     0x7f5a69c7d420       3792  (unknown)
    @          0x21803aa         16  google::LogMessage::Fail()
    @          0x217f7be        160  google::LogMessage::SendToLog()
    @          0x21800da        112  google::LogMessage::Flush()
    @          0x2183a69         48  google::LogMessageFatal::~LogMessageFatal()
    @          0x1bc4314       3008  dfly::EngineShard::PollExecution()
    @          0x1cb6ea5        864  dfly::Transaction::ExecuteAsync()::$_6::operator()()
    @          0x1cb626d         48  std::__invoke_impl<>()
    @          0x1cb60fd         48  std::__invoke_r<>()
    @          0x1cb5d3d         48  std::_Function_handler<>::_M_invoke()
    @          0x1b39908         64  std::function<>::operator()()
    @          0x1e9f24c        336  util::fibers_ext::FiberQueue::Run()
    @          0x1bcb9a5        240  dfly::EngineShard::EngineShard()::$_2::operator()()
    @          0x1bcb79d         48  boost::context::detail::invoke<>()
    @          0x1bcb701         64  boost::context::detail::apply_impl<>()
    @          0x1bcb65e         80  _ZN5boost7context6detail5applyIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEEEDTcl10apply_implclsr3stdE7forwardIT_Efp_Eclsr3stdE7forwardIT0_Efp0_Etl18__make_integer_seqISt16integer_sequencemXsr3std10tuple_sizeINSt5decayISE_E4typeEEE5valueEEEEEOSD_OSE_
    @          0x1bcab72        176  boost::fibers::worker_context<>::run_()
    @          0x1bcd4da        160  std::__invoke_impl<>()
    @          0x1bcd070        128  std::__invoke<>()
    @          0x1bccecd        160  std::_Bind<>::__call<>()
    @          0x1bccc71         96  std::_Bind<>::operator()<>()
    @          0x1bccb18         96  boost::context::detail::invoke<>()
    @          0x1bcc8ce        112  boost::context::detail::fiber_record<>::run()
    @          0x1bcc03d        208  boost::context::detail::fiber_entry<>()
    @     0x7f5a69fdd1cf  (unknown)  make_fcontext
Abortado (imagem do núcleo gravada)


Fabio3rs avatar Oct 30 '22 22:10 Fabio3rs

Thanks @Fabio3rs ! @dranikpg take some time to see if you can reproduce inside list_family_test without Poco client. Otherwise, we should try adding it to pytests.

romange avatar Oct 31 '22 03:10 romange

@romange thanks for your fast response! It seems to me that is a race condition, I haven't enough time to study the project architecture and debug, but I am available to do more tests if needed.

Fabio3rs avatar Oct 31 '22 19:10 Fabio3rs


TEST_F(ListFamilyTest, TwoQueueBug) {
  std::atomic_bool running{true};
  std::atomic_int it_cnt{0};

  auto popFiber = [&]() {
    auto id = "t-"+std::to_string(it_cnt.fetch_add(1));
    while (running.load()) {
      Run(id, {"blpop", "a", "10"});
    }
  };

  auto pushFiber = [&]() {
    auto id = "t-"+std::to_string(it_cnt.fetch_add(1));
    for (int i = 0; i < 1000; i++) {
      Run(id, {"rpush", "a", "DATA"});
    }
    ::boost::this_fiber::sleep_for(1s);
    running = false;
  };

  vector<boost::fibers::fiber> fbs;

  for (int i = 0; i < 5; i++) {
    unsigned t = i % pp_->size();
    fbs.push_back(std::move(pp_->at(t)->LaunchFiber(popFiber)));
  }

  for (int i = 0; i < 5; i++) {
    unsigned t = i % pp_->size();
    fbs.push_back(std::move(pp_->at(t)->LaunchFiber(pushFiber)));
  }

  while (running.load()) {
    this_fiber::sleep_for(50us);
  }

  for (auto& f : fbs)
    f.join();
}

dranikpg avatar Nov 03 '22 21:11 dranikpg

@dranikpg

Thanks! I don't know if it will help, but follows this test log.

[ctest] [ RUN      ] ListFamilyTest.TwoQueueBug
[ctest] I20221103 23:03:50.877496 62328 proactor_pool.cc:66] Running 4 io threads
[ctest] I20221103 23:03:50.878523 62328 server_family.cc:380] Data directory is "/mnt/projects/Projects/dragonfly/build/src/server"
[ctest] I20221103 23:03:50.878566 62328 test_utils.cc:148] Starting TwoQueueBug
[ctest] F20221103 23:03:50.879225 62872 engine_shard_set.cc:148] Check failed: committed_txid_ == trans->notify_txid() (160 vs. 157) 
[ctest] *** Check failure stack trace: ***
[ctest]     @          0x24ebbca  google::LogMessage::Fail()
[ctest]     @          0x24eafde  google::LogMessage::SendToLog()
[ctest]     @          0x24eb8fa  google::LogMessage::Flush()
[ctest]     @          0x24ef289  google::LogMessageFatal::~LogMessageFatal()
[ctest]     @          0x1e9b2c4  dfly::EngineShard::PollExecution()
[ctest]     @          0x1f8d745  dfly::Transaction::ExecuteAsync()::$_6::operator()()
[ctest]     @          0x1f8cb0d  std::__invoke_impl<>()
[ctest]     @          0x1f8c99d  _ZSt10__invoke_rIvRZN4dfly11Transaction12ExecuteAsyncEvE3$_6JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES5_E4typeEOS6_DpOS7_
[ctest]     @          0x1f8c5dd  std::_Function_handler<>::_M_invoke()
[ctest]     @          0x20b2cc8  std::function<>::operator()()
[ctest]     @          0x2256a6c  util::fibers_ext::FiberQueue::Run()
[ctest]     @          0x1ea2955  dfly::EngineShard::EngineShard()::$_2::operator()()
[ctest]     @          0x1ea274d  _ZN5boost7context6detail6invokeIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSD_DpOT0_EE4typeEE4typeESH_SK_
[ctest]     @          0x1ea26b1  _ZN5boost7context6detail10apply_implIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEJEEEDTclsr5boost7context6detailE6invokeclsr3stdE7forwardIT_Efp_Espclsr3stdE3getIXT1_EEclsr3stdE7forwardIT0_Efp0_EEEEOSD_OSE_St16integer_sequenceImJXspT1_EEE
[ctest]     @          0x1ea260e  _ZN5boost7context6detail5applyIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEEEDTcl10apply_implclsr3stdE7forwardIT_Efp_Eclsr3stdE7forwardIT0_Efp0_Etl18__make_integer_seqISt16integer_sequencemXsr3std10tuple_sizeINSt5decayISE_E4typeEEE5valueEEEEEOSD_OSE_
[ctest]     @          0x1ea1b22  boost::fibers::worker_context<>::run_()
[ctest]     @          0x1ea448a  std::__invoke_impl<>()
[ctest]     @          0x1ea4020  std::__invoke<>()
[ctest]     @          0x1ea3e7d  _ZNSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_7context5fiberEOSD_EPSB_St12_PlaceholderILi1EEEE6__callISD_JSE_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
[ctest]     @          0x1ea3c21  std::_Bind<>::operator()<>()
[ctest]     @          0x1ea3ac8  _ZN5boost7context6detail6invokeIRSt5_BindIFMNS_6fibers14worker_contextIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2JEEEFNS0_5fiberEOSF_EPSE_St12_PlaceholderILi1EEEEJSF_EEENSt9enable_ifIXntsr3std17is_member_pointerINSt5decayIT_E4typeEEE5valueENSt9result_ofIFOSR_DpOT0_EE4typeEE4typeESV_SY_
[ctest]     @          0x1ea387e  boost::context::detail::fiber_record<>::run()
[ctest]     @          0x1ea2fed  boost::context::detail::fiber_entry<>()
[ctest]     @     0x7fb097c571cf  make_fcontext
[ctest] *** SIGABRT received at time=1667527430 on cpu 0 ***
[ctest] PC: @     0x7fb09767e00b  (unknown)  raise
[ctest]     @          0x2543fa0         64  absl::lts_20220623::WriteFailureInfo()
[ctest]     @          0x2543c5d        240  absl::lts_20220623::AbslFailureSignalHandler()
[ctest]     @     0x7fb097864420       3792  (unknown)
[ctest]     @          0x24ebbca         16  google::LogMessage::Fail()
[ctest]     @          0x24eafde        160  google::LogMessage::SendToLog()
[ctest]     @          0x24eb8fa        112  google::LogMessage::Flush()
[ctest]     @          0x24ef289         48  google::LogMessageFatal::~LogMessageFatal()
[ctest]     @          0x1e9b2c4       3008  dfly::EngineShard::PollExecution()
[ctest]     @          0x1f8d745        864  dfly::Transaction::ExecuteAsync()::$_6::operator()()
[ctest]     @          0x1f8cb0d         48  std::__invoke_impl<>()
[ctest]     @          0x1f8c99d         48  std::__invoke_r<>()
[ctest]     @          0x1f8c5dd         48  std::_Function_handler<>::_M_invoke()
[ctest]     @          0x20b2cc8         64  std::function<>::operator()()
[ctest]     @          0x2256a6c        336  util::fibers_ext::FiberQueue::Run()
[ctest]     @          0x1ea2955        240  dfly::EngineShard::EngineShard()::$_2::operator()()
[ctest]     @          0x1ea274d         48  boost::context::detail::invoke<>()
[ctest]     @          0x1ea26b1         64  boost::context::detail::apply_impl<>()
[ctest]     @          0x1ea260e         80  _ZN5boost7context6detail5applyIZN4dfly11EngineShardC1EPN4util12ProactorBaseEbP9mi_heap_sE3$_2St5tupleIJEEEEDTcl10apply_implclsr3stdE7forwardIT_Efp_Eclsr3stdE7forwardIT0_Efp0_Etl18__make_integer_seqISt16integer_sequencemXsr3std10tuple_sizeINSt5decayISE_E4typeEEE5valueEEEEEOSD_OSE_
[ctest]     @          0x1ea1b22        176  boost::fibers::worker_context<>::run_()
[ctest]     @          0x1ea448a        160  std::__invoke_impl<>()
[ctest]     @          0x1ea4020        128  std::__invoke<>()
[ctest]     @          0x1ea3e7d        160  std::_Bind<>::__call<>()
[ctest]     @          0x1ea3c21         96  std::_Bind<>::operator()<>()
[ctest]     @          0x1ea3ac8         96  boost::context::detail::invoke<>()
[ctest]     @          0x1ea387e        112  boost::context::detail::fiber_record<>::run()
[ctest]     @          0x1ea2fed        208  boost::context::detail::fiber_entry<>()
[ctest]     @     0x7fb097c571cf  (unknown)  make_fcontext
[ctest] 

Fabio3rs avatar Nov 04 '22 02:11 Fabio3rs

@Fabio3rs can you please check out branch Bug451 and see if you can reproduce the crash?

Thanks!

romange avatar Nov 04 '22 16:11 romange

@Fabio3rs can you please check out branch Bug451 and see if you can reproduce the crash?

Thanks!

Thanks!

I can't reproduce the crash and all the tests passed successfully.

My test code: image

I did all the tests in Debug. For some reason the compilation in Release is failing at the linking step:

[build] [1/1 100% :: 0.057] Linking CXX executable dragonfly
[build] FAILED: dragonfly 
[build] : && /usr/bin/clang++  -Wall -Wextra -g -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-omit-frame-pointer -Wno-unused-parameter -march=sandybridge -mtune=skylake  -std=c++20 -DHAS_RAWMEMCHR -fcolor-diagnostics -Wno-deprecated-copy  -O3 -DNDEBUG   src/server/CMakeFiles/dragonfly.dir/dfly_main.cc.o  -o dragonfly  lib/libbase.a  lib/libdragonfly_lib.a  lib/libepoll_fiber_lib.a  lib/libdfly_transaction.a  lib/libdfly_core.a  lib/liblua_modules.a  third_party/libs/lua/lib/liblua.a  -lcrypto  lib/libdfly_facade.a  lib/liburing_fiber_lib.a  third_party/libs/uring/lib/liburing.a  lib/libfibers_ext.a  lib/libio.a  lib/libhttp_server_lib.a  lib/libhttp_beast_prebuilt.a  /usr/lib/x86_64-linux-gnu/libboost_system.so.1.71.0  lib/libmetrics.a  third_party/libs/gperf/lib/libprofiler.a  -lunwind  lib/libtls_lib.a  /usr/lib/x86_64-linux-gnu/libssl.so  /usr/lib/x86_64-linux-gnu/libcrypto.so  third_party/libs/dconv/lib/libdouble-conversion.a  lib/libredis_lib.a  third_party/libs/mimalloc/lib/libmimalloc.a  lib/libstrings_lib.a  lib/libhtml_lib.a  _deps/abseil_cpp-build/absl/random/libabsl_random_distributions.a  _deps/abseil_cpp-build/absl/random/libabsl_random_seed_sequences.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_pool_urbg.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_hwaes.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_hwaes_impl.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_randen_slow.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_platform.a  _deps/abseil_cpp-build/absl/random/libabsl_random_internal_seed_material.a  _deps/abseil_cpp-build/absl/random/libabsl_random_seed_gen_exception.a  lib/libproactor_lib.a  lib/libbase.a  _deps/glog-build/libglog.a  /usr/lib/x86_64-linux-gnu/libunwind.so  _deps/abseil_cpp-build/absl/flags/libabsl_flags_parse.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_usage.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_usage_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_marshalling.a  _deps/abseil_cpp-build/absl/strings/libabsl_str_format_internal.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_reflection.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_config.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_program_name.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_private_handle_accessor.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_commandlineflag.a  _deps/abseil_cpp-build/absl/flags/libabsl_flags_commandlineflag_internal.a  -latomic  -lrt  _deps/abseil_cpp-build/absl/debugging/libabsl_failure_signal_handler.a  _deps/abseil_cpp-build/absl/debugging/libabsl_examine_stack.a  third_party/libs/xxhash/lib/libxxhash.a  _deps/abseil_cpp-build/absl/strings/libabsl_cord.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_info.a  _deps/abseil_cpp-build/absl/strings/libabsl_cord_internal.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_functions.a  _deps/abseil_cpp-build/absl/strings/libabsl_cordz_handle.a  _deps/abseil_cpp-build/absl/hash/libabsl_hash.a  _deps/abseil_cpp-build/absl/hash/libabsl_city.a  _deps/abseil_cpp-build/absl/types/libabsl_bad_variant_access.a  _deps/abseil_cpp-build/absl/hash/libabsl_low_level_hash.a  _deps/abseil_cpp-build/absl/container/libabsl_raw_hash_set.a  _deps/abseil_cpp-build/absl/types/libabsl_bad_optional_access.a  _deps/abseil_cpp-build/absl/container/libabsl_hashtablez_sampler.a  _deps/abseil_cpp-build/absl/profiling/libabsl_exponential_biased.a  _deps/abseil_cpp-build/absl/synchronization/libabsl_synchronization.a  _deps/abseil_cpp-build/absl/debugging/libabsl_symbolize.a  _deps/abseil_cpp-build/absl/debugging/libabsl_demangle_internal.a  _deps/abseil_cpp-build/absl/debugging/libabsl_stacktrace.a  _deps/abseil_cpp-build/absl/debugging/libabsl_debugging_internal.a  _deps/abseil_cpp-build/absl/synchronization/libabsl_graphcycles_internal.a  _deps/abseil_cpp-build/absl/base/libabsl_malloc_internal.a  _deps/abseil_cpp-build/absl/time/libabsl_time.a  _deps/abseil_cpp-build/absl/strings/libabsl_strings.a  _deps/abseil_cpp-build/absl/strings/libabsl_strings_internal.a  _deps/abseil_cpp-build/absl/base/libabsl_throw_delegate.a  _deps/abseil_cpp-build/absl/numeric/libabsl_int128.a  _deps/abseil_cpp-build/absl/time/libabsl_civil_time.a  _deps/abseil_cpp-build/absl/time/libabsl_time_zone.a  _deps/abseil_cpp-build/absl/base/libabsl_base.a  _deps/abseil_cpp-build/absl/base/libabsl_raw_logging_internal.a  -pthread  _deps/abseil_cpp-build/absl/base/libabsl_log_severity.a  _deps/abseil_cpp-build/absl/base/libabsl_spinlock_wait.a  -lrt  /usr/lib/x86_64-linux-gnu/libanl.a  /usr/lib/x86_64-linux-gnu/libboost_fiber.so.1.71.0  /usr/lib/x86_64-linux-gnu/libboost_context.so.1.71.0  /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.71.0 && :
[build] ld: error: undefined symbol: nonstd::expected_lite::expected<unsigned char, std::error_code> dfly::RdbLoaderBase::FetchInt<unsigned char>()
[build] >>> referenced by rdb_load.h:95 (../src/server/rdb_load.h:95)
[build] >>>               generic_family.cc.o:(std::_Function_handler<facade::OpStatus (dfly::Transaction*, dfly::EngineShard*), decltype(fp(this, nullptr)) dfly::Transaction::ScheduleSingleHopT<dfly::GenericFamily::Restore(absl::lts_20220623::Span<absl::lts_20220623::Span<char> >, dfly::ConnectionContext*)::$_12>(dfly::GenericFamily::Restore(absl::lts_20220623::Span<absl::lts_20220623::Span<char> >, dfly::ConnectionContext*)::$_12&&)::'lambda'(dfly::Transaction*, dfly::EngineShard*)>::_M_invoke(std::_Any_data const&, dfly::Transaction*&&, dfly::EngineShard*&&)) in archive lib/libdragonfly_lib.a
[build] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[build] ninja: build stopped: subcommand failed.
[proc] The command: /usr/bin/cmake --build /mnt/projects/Projects/dragonfly/build --config Release --target dragonfly -- exited with code: 1 and signal: null
[build] Build finished with exit code 1

An observation: Clang is giving these warnings:

[build] ../src/server/bitops_family.cc:85:12: warning: local variable 'new_value' will be copied despite being returned by name [-Wreturn-std-move]
[build]     return new_value;
[build]            ^~~~~~~~~
[build] ../src/server/bitops_family.cc:85:12: note: call 'std::move' explicitly to avoid copying
[build]     return new_value;
[build]            ^~~~~~~~~
[build]            std::move(new_value)
[build] ../src/server/bitops_family.cc:444:27: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
[build]       values.emplace_back(std::move(GetString(find_res.value()->second, es)));
[build]                           ^
[build] ../src/server/bitops_family.cc:444:27: note: remove std::move call here
[build]       values.emplace_back(std::move(GetString(find_res.value()->second, es)));
[build]                           ^~~~~~~~~~                                       ~
[build] ../src/server/bitops_family.cc:85:12: warning: local variable 'new_value' will be copied despite being returned by name [-Wreturn-std-move]
[build]     return new_value;
[build]            ^~~~~~~~~
[build] ../src/server/bitops_family.cc:368:14: note: in instantiation of function template specialization 'dfly::(anonymous namespace)::BitOpString<unsigned char (*)(unsigned char, unsigned char), bool (*)(unsigned char)>' requested here
[build]       return BitOpString(OrOp, SkipOr, std::move(values), std::move(default_str));
[build]              ^
[build] ../src/server/bitops_family.cc:85:12: note: call 'std::move' explicitly to avoid copying
[build]     return new_value;
[build]            ^~~~~~~~~
[build]            std::move(new_value)

[build] [535/552  96% :: 46.266] Building CXX object src/server/CMakeFiles/list_family_test.dir/list_family_test.cc.o
[build] ../src/server/list_family_test.cc:652:19: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
[build]     fbs.push_back(std::move(pp_->at(i)->LaunchFiber(pop_fiber)));
[build]                   ^
[build] ../src/server/list_family_test.cc:652:19: note: remove std::move call here
[build]     fbs.push_back(std::move(pp_->at(i)->LaunchFiber(pop_fiber)));
[build]                   ^~~~~~~~~~                                  ~
[build] ../src/server/list_family_test.cc:657:19: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
[build]     fbs.push_back(std::move(pp_->at(i)->LaunchFiber(push_fiber)));
[build]                   ^
[build] ../src/server/list_family_test.cc:657:19: note: remove std::move call here
[build]     fbs.push_back(std::move(pp_->at(i)->LaunchFiber(push_fiber)));
[build]                   ^~~~~~~~~~                                   ~
[build] 2 warnings generated.

Fabio3rs avatar Nov 04 '22 22:11 Fabio3rs