dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

replication crash while cancelling and heartbeat running on the same time

Open adiholden opened this issue 2 years ago • 0 comments

Describe the bug Crash on write journal entry from heartbeat flow, while cancelling replica change_cb_arr_ inside JournalSlice class is not locked, so we can iterate it write to sink, preemt, than continue with the iteration on the change_cb_arr_ which changed because UnregisterOnChange was called *** Check failure stack trace: *** @ 0x5578ae6ddbbf google::LogMessage::Fail() @ 0x5578ae6ddb05 google::LogMessage::SendToLog() @ 0x5578ae6dd2da google::LogMessage::Flush() @ 0x5578ae6e1152 google::LogMessageFatal::~LogMessageFatal() @ 0x5578ad56b36a _ZZN13MainInitGuardC4EPiPPPcjENKUlvE_clEv @ 0x5578ad56b38c _ZZN13MainInitGuardC4EPiPPPcjENUlvE_4_FUNEv @ 0x7f4b8b2b628c (unknown) @ 0x7f4b8b2b62f7 std::terminate() @ 0x7f4b8b2b6558 __cxa_throw @ 0x7f4b8b2ad6f4 std::__throw_bad_function_call() @ 0x5578ae0e2c54 std::function<>::operator()() @ 0x5578ae0df127 dfly::journal::JournalSlice::AddLogRecord() Stopping instance on 1117 @ 0x5578ae0d4d86 dfly::journal::Journal::RecordEntry() @ 0x5578ae0ca46d dfly::TriggerJournalWriteToSink() @ 0x5578ae060481 dfly::EngineShard::Heartbeat() @ 0x5578ae060510 dfly::EngineShard::RunPeriodic() @ 0x5578ae059ba4 _ZZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sENKUlvE1_clEv @ 0x5578ae071f79 ZSt13__invoke_implIvZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEET_St14__invoke_otherOT0_DpOT1 @ 0x5578ae06ebfa ZSt8__invokeIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS9_DpOSA @ 0x5578ae06c036 _ZSt12__apply_implIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_St5tupleIJEEJEEDcOT_OT0_St16integer_sequenceImJXspT1_EEE @ 0x5578ae06c0b3 ZSt5applyIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_St5tupleIJEEEDcOT_OT0 @ 0x5578ae06c2f7 _ZN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEE4run_EONS_7context5fiberE @ 0x5578ae07fd77 ZSt13__invoke_implIN5boost7context5fiberERMNS0_6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFS2_OS2_ERPSD_JS2_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2 @ 0x5578ae07f037 ZSt8__invokeIRMN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFNS0_7context5fiberEOSD_EJRPSB_SD_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSL_DpOSM @ 0x5578ae07e16a _ZNSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFNS0_7context5fiberEOSD_EPSB_St12_PlaceholderILi1EEEE6__callISD_JSE_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x5578ae07d227 ZNSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFNS0_7context5fiberEOSD_EPSB_St12_PlaceholderILi1EEEEclIJSD_ESD_EET0_DpOT I20230320 16:25:00.382179 1486404 dflycmd.cc:513] Disconnecting from replica 127.0.0.1:1117 I20230320 16:25:00.382246 1486404 dflycmd.cc:464] Replication error: Operation canceled: Context cancelled @ 0x5578ae07bcfe ZSt13__invoke_implIN5boost7context5fiberERSt5_BindIFMNS0_6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFS2_OS2_EPSE_St12_PlaceholderILi1EEEEJS2_EET_St14__invoke_otherOT0_DpOT1 Stopping instance on 1115 @ 0x5578ae078d27 ZSt8__invokeIRSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFNS1_7context5fiberEOSE_EPSC_St12_PlaceholderILi1EEEEJSE_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSP_DpOSQ @ 0x5578ae07728f ZSt6invokeIRSt5_BindIFMN5boost6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFNS1_7context5fiberEOSE_EPSC_St12_PlaceholderILi1EEEEJSE_EENSt13invoke_resultIT_JDpT0_EE4typeEOSP_DpOSQ @ 0x5578ae075369 _ZN5boost7context6detail12fiber_recordINS0_5fiberENS0_21basic_fixedsize_stackINS0_12stack_traitsEEESt5_BindIFMNS_6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFS3_OS3_EPSI_St12_PlaceholderILi1EEEEE3runEPv @ 0x5578ae072724 _ZN5boost7context6detail11fiber_entryINS1_12fiber_recordINS0_5fiberENS0_21basic_fixedsize_stackINS0_12stack_traitsEEESt5_BindIFMNS_6fibers14worker_contextIZN4dfly11EngineShardC4EPN4util12ProactorBaseEbP9mi_heap_sEUlvE1_JEEEFS4_OS4_EPSJ_St12_PlaceholderILi1EEEEEEEEvNS1_10transfer_tE @ 0x7f4b8b45524f make_fcontext *** SIGABRT received at time=1679322300 on cpu 0 *** PC: @ 0x7f4b8a938a7c (unknown) pthread_kill @ 0x5578ae7311df 64 absl::lts_20230125::WriteFailureInfo() @ 0x5578ae7313e9 96 absl::lts_20230125::AbslFailureSignalHandler() @ 0x7f4b8a8e4520 (unknown) (unknown)

adiholden avatar Mar 20 '23 14:03 adiholden