rr icon indicating copy to clipboard operation
rr copied to clipboard

rr replay segfaults during a first (and only) command `continue`

Open FooBarrior opened this issue 1 year ago • 14 comments

OS: Arch linux uname -r: 6.8.9-arch1-2 rr rev: 25063eb6 cpu: i9-13900HX

I am rr-ing MariaDB/server. The server execution goes fine, except for the following assertion in the end:

[FATAL src/RecordSession.cc:2028:process_syscall_entry()] 
 (task 40636 (rec:40636) at time 24384)
 -> Assertion `t->desched_rec() || is_rrcall_notify_syscall_hook_exit_syscall( t->regs().original_syscallno(), t->arch()) || t->ip() == t->vm() ->privileged_traced_syscall_ip() .increment_by_syscall_insn_length(t->arch())' failed to hold. Stashed signal pending on syscall entry when it shouldn't be: {signo:SIGSTOP,errno:SUCCESS,code:SI_TKILL}; regs={ ip:0x73fc470867fb args:(0xefffffffffffffff,0,0,0,0,0) orig_syscall: 24 syscallno: -38 }; last_execution_resume=0x73fc470867fb; sig ip=0x73fc470867fb

I'm used to have it, usually the replay worked fine after that.

However, the replay segfaults at some point now. rr built with -O0 -g produces the following core dump:

(gdb) bt full
#0  0x00005e7aedfc709d in rr::Task::tuid (this=0x0) at /usr/src/debug/rr-git/rr/src/Task.h:696
No locals.
#1  0x00005e7aee0666a5 in rr::extended_task_id (t=0x0) at /usr/src/debug/rr-git/rr/src/GdbServer.cc:54
No locals.
#2  0x00005e7aee066d7c in rr::matches_threadid (t=0x0, target=...) at /usr/src/debug/rr-git/rr/src/GdbServer.cc:142
No locals.
#3  0x00005e7aee06d1a2 in rr::compute_run_command_from_actions (t=0x0, req=..., signal_to_deliver=0x7fff5f3e9bd0) at /usr/src/debug/rr-git/rr/src/GdbServer.cc:1099
        action = @0x5e7aef6d82f0: {type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}
        __for_range = std::vector of length 1, capacity 1 = {{type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}}
        __for_begin = {type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}
        __for_end = {type = (rr::ACTION_STEP | unknown: 0x4e), target = {pid = 0, tid = 33}, signal_to_deliver = 0}
#4  0x00005e7aee06f11d in rr::GdbServer::debug_one_step (this=0x7fff5f3ea520, last_resume_request=...) at /usr/src/debug/rr-git/rr/src/GdbServer.cc:1463
        signal_to_deliver = 0
        command = rr::RUN_CONTINUE
        __FUNCTION__ = "debug_one_step"
        result = {status = rr::REPLAY_CONTINUE, break_status = {task_context = {task = 0x0, session = 0x0, thread_group = std::shared_ptr<rr::ThreadGroup> (empty) = {get() = 0x0}}, watchpoints_hit = std::vector of length 0, capacity 0, 
            signal = std::unique_ptr<siginfo_t> = {get() = 0x0}, breakpoint_hit = false, singlestep_complete = false, approaching_ticks_target = false, task_exit = false}, did_fast_forward = false, incomplete_fast_forward = false}
        req = {type = rr::DREQ_CONT, target = {pid = -1, tid = -1}, suppress_debugger_stop = false, mem_ = {addr = 0, len = 0, data = std::vector of length 0, capacity 0}, watch_ = {addr = 0, kind = 0, 
            conditions = std::vector of length 0, capacity 0}, reg_ = {name = rr::DREG_EAX, {value = "H\235>_\377\177\000\000\000\000\000\000\000\000\000", value1 = 72 'H', value2 = 40264, value4 = 1597939016, value8 = 140734791327048}, 
            size = 103878083165952, defined = 112}, restart_ = {param = 0, param_str = "", type = rr::RESTART_FROM_PREVIOUS}, cont_ = {run_direction = rr::RUN_FORWARD, actions = std::vector of length 1, capacity 1 = {{
                type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}}}, rr_cmd_ = {name = "", target_tid = -1, args = std::vector of length 0, capacity 0}, tls_ = {offset = 0, load_module = {ptr = 0}}, 
          sym_ = {has_address = false, address = {ptr = 0}, name = ""}, file_setfs_ = {pid = -1}, file_open_ = {file_name = "", flags = 0, mode = 0}, file_pread_ = {fd = -1, size = 0, offset = 0}, file_close_ = {fd = -1}, mem_alloc_ = {
            size = 0, prot = 0}, mem_free_ = {address = {ptr = 0}}, restore_register_state_ = {state_index = 0}}
        s = (rr::GdbServer::STOP_DEBUGGING | unknown: 0x7ffe)
#5  0x00005e7aee070f53 in rr::GdbServer::serve_replay (session=std::shared_ptr<rr::ReplaySession> (empty) = {...}, target=..., stop_replaying_to_target=0x5e7aee4c1bb0 <rr::stop_replaying_to_target>, flags=...)
    at /usr/src/debug/rr-git/rr/src/GdbServer.cc:1842
        connection = std::unique_ptr<rr::GdbServerConnection> = {get() = 0x0}
        server = {dbg = std::unique_ptr<rr::GdbServerConnection> = {get() = 0x5e7aeeb165c0}, debuggee_tguid = {tid_ = 40636, serial_ = 1}, target = {pid = 40636, require_exec = false, event = 14}, 
          thread_db = std::unique_ptr<rr::ThreadDb> = {get() = 0x5e7aef057460}, last_continue_task = {tguid = {tid_ = 40636, serial_ = 1}, tuid = {tid_ = 40636, serial_ = 1}}, last_query_task = {tguid = {tid_ = 40636, serial_ = 1}, 
            tuid = {tid_ = 40636, serial_ = 1}}, final_event = 4294967295, stop_siginfo = {si_signo = 5, si_errno = 0, si_code = 0, __pad0 = 0, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {
                si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, 
                si_stime = 0}, _sigfault = {si_addr = 0x0, si_addr_lsb = 0, _bounds = {_addr_bnd = {_lower = 0x0, _upper = 0x0}, _pkey = 0}}, _sigpoll = {si_band = 0, si_fd = 0}, _sigsys = {_call_addr = 0x0, _syscall = 0, _arch = 0}}}, 
          in_debuggee_end_state = false, failed_restart = false, stop_replaying_to_target = 0x0, interrupt_pending = false, exit_sigkill_pending = false, timeline_ = 0x7fff5f3ea1d0, emergency_debug_session = 0x0, 
          debugger_restart_checkpoint = {mark = {ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aeeb29ac0}}, last_continue_task = {tguid = {tid_ = 40636, serial_ = 1}, tuid = {
                tid_ = 40636, serial_ = 1}}, is_explicit = rr::GdbServer::Checkpoint::EXPLICIT, where = "???"}, checkpoints = std::map with 0 elements, symbols = std::set with 70 elements = {[0] = "__nptl_create_event", 
            [1] = "__nptl_death_event", [2] = "__nptl_initial_report_events", [3] = "__nptl_last_event", [4] = "__nptl_nthreads", [5] = "__nptl_rtld_global", [6] = "__nptl_threads_events", [7] = "__nptl_version", [8] = "__pthread_keys", 
            [9] = "_dl_stack_used", [10] = "_dl_stack_user", [11] = "_dl_tls_dtv_slotinfo_list", [12] = "_thread_db___nptl_initial_report_events", [13] = "_thread_db___nptl_last_event", [14] = "_thread_db___nptl_nthreads", 
            [15] = "_thread_db___nptl_rtld_global", [16] = "_thread_db___pthread_keys", [17] = "_thread_db__dl_stack_used", [18] = "_thread_db__dl_stack_user", [19] = "_thread_db__dl_tls_dtv_slotinfo_list", 
            [20] = "_thread_db_const_thread_area", [21] = "_thread_db_dtv_dtv", [22] = "_thread_db_dtv_slotinfo_gen", [23] = "_thread_db_dtv_slotinfo_list_len", [24] = "_thread_db_dtv_slotinfo_list_next", 
            [25] = "_thread_db_dtv_slotinfo_list_slotinfo", [26] = "_thread_db_dtv_slotinfo_map", [27] = "_thread_db_dtv_t_counter", [28] = "_thread_db_dtv_t_pointer_val", [29] = "_thread_db_link_map_l_tls_modid", 
            [30] = "_thread_db_link_map_l_tls_offset", [31] = "_thread_db_list_t_next", [32] = "_thread_db_list_t_prev", [33] = "_thread_db_pthread_cancelhandling", [34] = "_thread_db_pthread_dtvp", [35] = "_thread_db_pthread_eventbuf", 
            [36] = "_thread_db_pthread_eventbuf_eventmask", [37] = "_thread_db_pthread_eventbuf_eventmask_event_bits", [38] = "_thread_db_pthread_key_data_data", [39] = "_thread_db_pthread_key_data_level2_data", 
            [40] = "_thread_db_pthread_key_data_seq", [41] = "_thread_db_pthread_key_struct_destr", [42] = "_thread_db_pthread_key_struct_seq", [43] = "_thread_db_pthread_list", [44] = "_thread_db_pthread_nextevent", 
            [45] = "_thread_db_pthread_report_events", [46] = "_thread_db_pthread_schedparam_sched_priority", [47] = "_thread_db_pthread_schedpolicy", [48] = "_thread_db_pthread_specific", [49] = "_thread_db_pthread_start_routine", 
            [50] = "_thread_db_pthread_tid", [51] = "_thread_db_register32", [52] = "_thread_db_register32_thread_area", [53] = "_thread_db_register64", [54] = "_thread_db_register64_thread_area", 
            [55] = "_thread_db_rtld_global__dl_stack_used", [56] = "_thread_db_rtld_global__dl_stack_user", [57] = "_thread_db_rtld_global__dl_tls_dtv_slotinfo_list", [58] = "_thread_db_sizeof_dtv_slotinfo", 
            [59] = "_thread_db_sizeof_dtv_slotinfo_list", [60] = "_thread_db_sizeof_list_t", [61] = "_thread_db_sizeof_pthread", [62] = "_thread_db_sizeof_pthread_key_data", [63] = "_thread_db_sizeof_pthread_key_data_level2", 
            [64] = "_thread_db_sizeof_pthread_key_struct", [65] = "_thread_db_sizeof_td_eventbuf_t", [66] = "_thread_db_sizeof_td_thr_events_t", [67] = "_thread_db_td_eventbuf_t_eventdata", [68] = "_thread_db_td_eventbuf_t_eventnum", 
            [69] = "_thread_db_td_thr_events_t_event_bits"}, symbols_iter = <error: Cannot access memory at address 0x46>, files = std::map with 0 elements, memory_files = std::map with 22 elements = {[0] = {device = 26, 
              inode = 6696520}, [1] = {device = 26, inode = 6616225}, [2] = {device = 26, inode = 6616239}, [3] = {device = 26, inode = 15333}, [4] = {device = 26, inode = 6329366}, [5] = {device = 26, inode = 483941}, [6] = {
              device = 26, inode = 6617055}, [7] = {device = 26, inode = 6280537}, [8] = {device = 26, inode = 6616244}, [9] = {device = 26, inode = 6299822}, [10] = {device = 26, inode = 6299820}, [11] = {device = 26, inode = 3214503}, 
            [12] = {device = 26, inode = 3213466}, [13] = {device = 26, inode = 6616264}, [14] = {device = 26, inode = 6616959}, [15] = {device = 26, inode = 6616234}, [16] = {device = 26, inode = 3140212}, [17] = {device = 26, 
              inode = 17281}, [18] = {device = 26, inode = 3140919}, [19] = {device = 26, inode = 3483768}, [20] = {device = 26, inode = 16725}, [21] = {device = 26, inode = 5263507}}, file_scope_pid = 40636, 
          debugger_mem = std::unordered_map with 0 elements, saved_register_states = std::unordered_map with 0 elements}
        last_resume_request = {type = rr::DREQ_CONT, target = {pid = -1, tid = -1}, suppress_debugger_stop = false, mem_ = {addr = 0, len = 0, data = std::vector of length 0, capacity 0}, watch_ = {addr = 0, kind = 0, 
            conditions = std::vector of length 0, capacity 0}, reg_ = {name = rr::DREG_EAX, {value = "\000\000\000\000\000\000\000\000\000\244>_\377\177\000", value1 = 0 '\000', value2 = 0, value4 = 0, value8 = 0}, 
            size = 103882071821377, defined = 240}, restart_ = {param = 0, param_str = "", type = rr::RESTART_FROM_PREVIOUS}, cont_ = {run_direction = rr::RUN_FORWARD, actions = std::vector of length 1, capacity 1 = {{
                type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}}}, rr_cmd_ = {name = "", target_tid = -1, args = std::vector of length 0, capacity 0}, tls_ = {offset = 0, load_module = {ptr = 0}}, 
          sym_ = {has_address = false, address = {ptr = 0}, name = ""}, file_setfs_ = {pid = -1}, file_open_ = {file_name = "", flags = 0, mode = 0}, file_pread_ = {fd = -1, size = 0, offset = 0}, file_close_ = {fd = -1}, mem_alloc_ = {
            size = 0, prot = 0}, mem_free_ = {address = {ptr = 0}}, restore_register_state_ = {state_index = 0}}
        timeline = {current = std::shared_ptr<rr::ReplaySession> (use count 2, weak count 0) = {get() = 0x5e7aeeb11190}, current_at_or_after_mark = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 3, weak count 0) = {
            get() = 0x5e7aef7b71e0}, marks = std::map with 18 elements = {[{trace_time = 14, ticks = 0, step_key = {action = rr::TSTEP_RETIRE}}] = std::vector of length 1, capacity 1 = {
              std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aeeb15640}}, [{trace_time = 15, ticks = 0, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {
              std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aeeb29ac0}}, [{trace_time = 2216, ticks = 9755927, step_key = {
                action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef054b00}}, [{trace_time = 2220, ticks = 11590872, 
              step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef392ba0}}, [{trace_time = 2224, 
              ticks = 13425817, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef7b7920}}, [{
              trace_time = 2228, ticks = 14403638, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {
                get() = 0x5e7aef6d8f80}}, [{trace_time = 6987, ticks = 80203198, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {
              std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef792b40}}, [{trace_time = 8330, ticks = 22658831, step_key = {
                action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef6edbf0}}, [{trace_time = 9104, ticks = 62183524, 
              step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef272b10}}, [{trace_time = 10929, 
              ticks = 5900139, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef734d30}}, [{
              trace_time = 12583, ticks = 53422450, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {
                get() = 0x5e7aef70ae90}}, [{trace_time = 14733, ticks = 73056576, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {
              std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef5b5e10}}, [{trace_time = 16667, ticks = 82922062, step_key = {
                action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef6f6f20}}, [{trace_time = 18282, ticks = 733569, 
              step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {get() = 0x5e7aef76a8b0}}, [{trace_time = 19766, 
              ticks = 101903933, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef6f3640}}, [{
              trace_time = 21850, ticks = 115252354, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 1, weak count 0) = {
                get() = 0x5e7aef72b0a0}}, [{trace_time = 22828, ticks = 137583299, step_key = {action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {
              std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef6decf0}}, [{trace_time = 23740, ticks = 94607957, step_key = {
                action = rr::TSTEP_NONE}}] = std::vector of length 1, capacity 1 = {std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 3, weak count 0) = {get() = 0x5e7aef7b71e0}}}, 
          marks_with_checkpoints = std::map with 7 elements = {[{trace_time = 14, ticks = 0, step_key = {action = rr::TSTEP_RETIRE}}] = 1, [{trace_time = 15, ticks = 0, step_key = {action = rr::TSTEP_NONE}}] = 1, [{trace_time = 2224, 
              ticks = 13425817, step_key = {action = rr::TSTEP_NONE}}] = 1, [{trace_time = 16667, ticks = 82922062, step_key = {action = rr::TSTEP_NONE}}] = 1, [{trace_time = 19766, ticks = 101903933, step_key = {
                action = rr::TSTEP_NONE}}] = 1, [{trace_time = 22828, ticks = 137583299, step_key = {action = rr::TSTEP_NONE}}] = 1, [{trace_time = 23740, ticks = 94607957, step_key = {action = rr::TSTEP_NONE}}] = 1}, 
          breakpoints = std::set with 4 elements = {[0] = std::tuple containing = {[0] = {<rr::TaskishUid<rr::AddressSpace>> = {tid_ = 40636, serial_ = 1}, exec_count_ = 1}, [1] = {ptr = 101724920227805}, 
              [2] = std::unique_ptr<rr::BreakpointCondition> = {get() = 0x0}}, [1] = std::tuple containing = {[0] = {<rr::TaskishUid<rr::AddressSpace>> = {tid_ = 40636, serial_ = 1}, exec_count_ = 1}, [1] = {ptr = 127527360766362}, 
              [2] = std::unique_ptr<rr::BreakpointCondition> = {get() = 0x0}}, [2] = std::tuple containing = {[0] = {<rr::TaskishUid<rr::AddressSpace>> = {tid_ = 40636, serial_ = 1}, exec_count_ = 1}, [1] = {ptr = 127527360810295}, 
              [2] = std::unique_ptr<rr::BreakpointCondition> = {get() = 0x0}}, [3] = std::tuple containing = {[0] = {<rr::TaskishUid<rr::AddressSpace>> = {tid_ = 40636, serial_ = 1}, exec_count_ = 1}, [1] = {ptr = 127527360887162}, 
              [2] = std::unique_ptr<rr::BreakpointCondition> = {get() = 0x0}}}, watchpoints = std::set with 0 elements, breakpoints_applied = true, reverse_execution_barrier_event_ = 15, 
          reverse_exec_checkpoints = std::map with 6 elements = {[{ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aeeb15640}}] = 38740, [{
              ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef7b7920}}] = 4027258, [{ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {
                get() = 0x5e7aef6f6f20}}] = 8238909, [{ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef6f3640}}] = 9239550, [{
              ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 2, weak count 0) = {get() = 0x5e7aef6decf0}}] = 10244834, [{ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (use count 3, weak count 0) = {
                get() = 0x5e7aef7b71e0}}] = 10749159}, no_watchpoints_hit_interval_start = {ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (empty) = {get() = 0x0}}, no_watchpoints_hit_interval_end = {
            ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (empty) = {get() = 0x0}}, reverse_exec_short_checkpoint = {ptr = std::shared_ptr<rr::ReplayTimeline::InternalMark> (empty) = {get() = 0x0}}}
        result = {status = rr::REPLAY_CONTINUE, break_status = {task_context = {task = 0x5e7aeeb12810, session = 0x5e7aeeb11190, thread_group = std::shared_ptr<rr::ThreadGroup> (use count 2, weak count 0) = {get() = 0x5e7aeeb12360}}, 
            watchpoints_hit = std::vector of length 0, capacity 0, signal = std::unique_ptr<siginfo_t> = {get() = 0x0}, breakpoint_hit = false, singlestep_complete = false, approaching_ticks_target = false, task_exit = false}, 
          did_fast_forward = false, incomplete_fast_forward = false}
        __FUNCTION__ = "serve_replay"
        port = 50505
        probe = rr::DONT_PROBE
        t = 0x5e7aeeb12810
        listen_socket = {fd = {fd = 7}, domain = 2, host = "127.0.0.1", port = 50505}
        first_run_event = 15
#6  0x00005e7aee1d2779 in rr::replay (trace_dir="/home/nik/mariadb/bld/mysql-test/var/log/mysqld.1.rr/latest-trace", flags=...) at /usr/src/debug/rr-git/rr/src/ReplayCommand.cc:512
        session = std::shared_ptr<rr::ReplaySession> (use count 2, weak count 0) = {get() = 0x5e7aeeb11190}
        conn_flags = {dbg_port = 50505, dbg_host = "", keep_listening = false, serve_files = false, debugger_params_write_pipe = 0x0, debugger_name = "gdb"}
        target = {pid = 0, require_exec = false, event = 0}
        debugger_params_pipe = {24, 0}
        __FUNCTION__ = "replay"
        sa = {__sigaction_handler = {sa_handler = 0x780000c549, sa_sigaction = 0x780000c549}, sa_mask = {__val = {140734791333544, 0, 0, 0, 16777216, 0, 140734791333592, 3, 6448231, 0, 0, 0, 0, 0, 0, 3258686624207601664}}, 
          sa_flags = 1751347809, sa_restorer = 0x0}
#7  0x00005e7aee1d35fe in rr::ReplayCommand::run (this=0x5e7aee4c1b90 <rr::ReplayCommand::singleton>, args=std::vector of length 0, capacity 4) at /usr/src/debug/rr-git/rr/src/ReplayCommand.cc:682
        found_dir = true
        trace_dir = "/home/nik/mariadb/bld/mysql-test/var/log/mysqld.1.rr/latest-trace"
        flags = {goto_event = 0, singlestep_to_event = 0, target_process = 0, target_command = "", process_created_how = rr::ReplayFlags::CREATED_NONE, dont_launch_debugger = true, dbg_port = 50505, dbg_host = "", 
          keep_listening = false, gdb_options = std::vector of length 0, capacity 0, gdb_binary_file_path = "gdb", redirect = true, cpu_unbound = false, share_private_mappings = false, retry_transient_errors = false, dump_interval = 0, 
          serve_files = false, tty = "", intel_pt_start_checking_event = -1}
#8  0x00005e7aee0d14ec in main (argc=5, argv=0x7fff5f3ebcf8) at /usr/src/debug/rr-git/rr/src/main.cc:278
        args = std::vector of length 0, capacity 4
        command = 0x5e7aee4c1b90 <rr::ReplayCommand::singleton>
(gdb) f 3
#3  0x00005e7aee06d1a2 in rr::compute_run_command_from_actions (t=0x0, req=..., signal_to_deliver=0x7fff5f3e9bd0) at /usr/src/debug/rr-git/rr/src/GdbServer.cc:1099
1099	    if (matches_threadid(t, action.target)) {
(gdb) p req
$1 = (const rr::GdbRequest &) @0x7fff5f3e9cd0: {type = rr::DREQ_CONT, target = {pid = -1, tid = -1}, suppress_debugger_stop = false, mem_ = {addr = 0, len = 0, data = std::vector of length 0, capacity 0}, watch_ = {addr = 0, kind = 0, 
    conditions = std::vector of length 0, capacity 0}, reg_ = {name = rr::DREG_EAX, {value = "H\235>_\377\177\000\000\000\000\000\000\000\000\000", value1 = 72 'H', value2 = 40264, value4 = 1597939016, value8 = 140734791327048}, 
    size = 103878083165952, defined = 112}, restart_ = {param = 0, param_str = "", type = rr::RESTART_FROM_PREVIOUS}, cont_ = {run_direction = rr::RUN_FORWARD, actions = std::vector of length 1, capacity 1 = {{
        type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}}}, rr_cmd_ = {name = "", target_tid = -1, args = std::vector of length 0, capacity 0}, tls_ = {offset = 0, load_module = {ptr = 0}}, sym_ = {
    has_address = false, address = {ptr = 0}, name = ""}, file_setfs_ = {pid = -1}, file_open_ = {file_name = "", flags = 0, mode = 0}, file_pread_ = {fd = -1, size = 0, offset = 0}, file_close_ = {fd = -1}, mem_alloc_ = {size = 0, 
    prot = 0}, mem_free_ = {address = {ptr = 0}}, restore_register_state_ = {state_index = 0}}
(gdb) 
$2 = (const rr::GdbRequest &) @0x7fff5f3e9cd0: {type = rr::DREQ_CONT, target = {pid = -1, tid = -1}, suppress_debugger_stop = false, mem_ = {addr = 0, len = 0, data = std::vector of length 0, capacity 0}, watch_ = {addr = 0, kind = 0, 
    conditions = std::vector of length 0, capacity 0}, reg_ = {name = rr::DREG_EAX, {value = "H\235>_\377\177\000\000\000\000\000\000\000\000\000", value1 = 72 'H', value2 = 40264, value4 = 1597939016, value8 = 140734791327048}, 
    size = 103878083165952, defined = 112}, restart_ = {param = 0, param_str = "", type = rr::RESTART_FROM_PREVIOUS}, cont_ = {run_direction = rr::RUN_FORWARD, actions = std::vector of length 1, capacity 1 = {{
        type = rr::ACTION_CONTINUE, target = {pid = 40636, tid = -1}, signal_to_deliver = 0}}}, rr_cmd_ = {name = "", target_tid = -1, args = std::vector of length 0, capacity 0}, tls_ = {offset = 0, load_module = {ptr = 0}}, sym_ = {
    has_address = false, address = {ptr = 0}, name = ""}, file_setfs_ = {pid = -1}, file_open_ = {file_name = "", flags = 0, mode = 0}, file_pread_ = {fd = -1, size = 0, offset = 0}, file_close_ = {fd = -1}, mem_alloc_ = {size = 0, 
    prot = 0}, mem_free_ = {address = {ptr = 0}}, restore_register_state_ = {state_index = 0}}
(gdb) 

Let me know if you need anything extra (like the core dump itself, or the rr record)

FooBarrior avatar May 13 '24 14:05 FooBarrior

It seems that the failure happens closer to the end of run, so the signal issue is likely related.

Steps to reproduce, roughly (Arch Linux):

git clone --depth=1 --branch 11.5 https://github.com/MariaDB/server.git mariadb
mkdir mariadb/bld
cd mariadb/bld
cmake -G Ninja \
-DMYSQL_MAINTAINER_MODE=WARN \
-DWITH_WSREP=0 \
-DPLUGIN_WSREP=1 \
-DWITH_VALGRIND=0 \
-DWITH_ASAN=1 \
-DWITH_MSAN=0 \
-DWITH_TSAN=0 \
-DWITH_UBSAN=0 \
-DWITH_GPROF=0 \
-DPLUGIN_CONNECTOR_C=NO \
-DENABLE_DTRACE=0 \
-DWITH_PCRE=bundled \
-DWITH_EMBEDDED_SERVER=no \
-DHAVE_EMBEDDED_PRIVILEGE_CONTROL=0 \
-DWITH_UNIT_TESTS=yes \
-DWITH_UNITTEST=0 \
-DSECURITY_HARDENED=0 \
-DWITH_SAFEMALLOC=0 \
-DPLUGIN_PERFSCHEMA=YES \
-DPLUGIN_TOKUDB=NO \
-DPLUGIN_SQL_ERRLOG=NO \
-DPLUGIN_ROCKSDB=NO \
-DPLUGIN_XPAND=NO \
-DWITH_COLUMNSTORE_STORAGE_ENGINE=YES \
-DPLUGIN_MROONGA=NO \
-DPLUGIN_CONNECT=NO \
-DPLUGIN_SPHINX=NO \
-DPLUGIN_SPIDER=NO \
-DPLUGIN_S3=NO \
-DPLUGIN_ARCHIVE=NO \
-DPLUGIN_BLACKHOLE=YES \
-DPLUGIN_CRACKLIB=no \
-DPLUGIN_TYPE_GEOM=no \
-Dplugin_file_key_management=YES \
-DWITH_MARIABACKUP=NO \
-DCMAKE_EXPORT_COMPILE_COMMANDS=0 \
-DWITH_INNODB_SNAPPY=OFF \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++

ninja minbuild
cd mysql-test
./mtr --mem main.1st --rr

rr replay var/log/mysqld.1.rr/latest-trace

Notes:

  1. My 13900HX is little-big, so I use taskset -c 0-15 for both record and replay. However I had that assertion on a simpler intel cpu as well
  2. The assertion failure is very stable, so the specific branch, test case, even build options and compiler might not be necessary. main.1st is a simplest possible test, and I have provided the fastest build config we have.
  3. For a better reproducibility, the commit ref I tested on is 929c2e06.

FooBarrior avatar May 13 '24 15:05 FooBarrior

Thanks for doing all that work to come up with steps to reproduce. Unfortunately, following them doesn't reproduce a bug for me. rr replay -a works normally with no errors. So does rr replay var/log/mysqld.1.rr/latest-trace followed by continue. What exact debugger commands are you running?

Re-replaying with this extra commit b7c3913039592e2a7ce58b3d5c7ff37f0df5a6fb might give us a more helpful error report. Please attach an rr dump along with the error text.

rocallahan avatar May 14 '24 11:05 rocallahan

@rocallahan did you get the assertion failureduring rr record? What's your OS?

FooBarrior avatar May 14 '24 11:05 FooBarrior

I don't. I'm on Fedora 39 kernel 6.8.6-200.fc39.x86_64.

roc@localhost:~/mariadb/bld/mysql-test$ ./mtr --mem main.1st --rr
Logging: /home/roc/mariadb/mysql-test/mariadb-test-run.pl  --mem main.1st --rr
VS config: 
vardir: /home/roc/mariadb/bld/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/roc/mariadb/bld/mysql-test/var'...
 - symlinking 'var' to '/dev/shm/var_auto__ydS'
Checking supported features...
MariaDB Version 11.5.0-MariaDB
 - SSL connections supported
Collecting tests...
Installing system database...

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[01] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
main.1st                                 [ pass ]      5
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.005 of 15 seconds executing testcases

Completed: All 1 tests were successful.

roc@localhost:~/mariadb/bld/mysql-test$ 

rocallahan avatar May 14 '24 11:05 rocallahan

The crash during replay probably is related to the fact that your recording terminated with an assertion. Maybe we should debug that recording assertion first...

rocallahan avatar May 14 '24 11:05 rocallahan

You could try recording with RR_LOG=all and dumping that log somewhere I can get it.

rocallahan avatar May 14 '24 11:05 rocallahan

Also, does your test harness send SIGSTOP to the tracee?

rocallahan avatar May 14 '24 11:05 rocallahan

Also, does your test harness send SIGSTOP to the tracee?

No. I asked the colleagues as well, we are concerned that the server doesn't do anything with SIGSTOP, nor does our test environment.

You may also want to ckeck the output of ~/mariadb/bld/mysql-test/var/log/mysqld.1.err. Mine usually has this in the end:

2024-05-13 21:04:17 0 [Note] /home/nik/mariadb/bld/sql/mariadbd: Shutdown complete

[FATAL src/RecordSession.cc:2028:process_syscall_entry()] 
 (task 148435 (rec:148435) at time 24268)
 -> Assertion `t->desched_rec() || is_rrcall_notify_syscall_hook_exit_syscall( t->regs().original_syscallno(), t->arch()) || t->ip() == t->vm() ->privileged_traced_syscall_ip() .increment_by_syscall_insn_length(t->arch())' failed to hold. Stashed signal pending on syscall entry when it shouldn't be: {signo:SIGSTOP,errno:SUCCESS,code:SI_TKILL}; regs={ ip:0x7ba75134c7fb args:(0xefffffffffffffff,0,0,0,0,0) orig_syscall: 24 syscallno: -38 }; last_execution_resume=0x7ba75134c7fb; sig ip=0x7ba75134c7fb
Tail of trace dump:
... and then the trace dump goes

Here's a full output. mysqld.1.err.txt

I'll get back with the dumps you've requested as well.

FooBarrior avatar May 14 '24 12:05 FooBarrior

I tried one more thing: I added fprintf(stderr, "Exiting...\n"); right before the call of exit(exit_code); in mysqld_exit. I and I saw that output:

2024-05-14 14:59:42 0 [Note] /home/nik/mariadb/bld/sql/mariadbd: Shutdown complete

Exiting...
[FATAL src/RecordSession.cc:2028:process_syscall_entry()] 
 (task 200630 (rec:200630) at time 25456)
 -> Assertion `t->desched_rec() |....

So it asserts in the real end of execution.

FooBarrior avatar May 14 '24 13:05 FooBarrior

@rocallahan here are RR_LOG=all log output and rr dump archived rr_dumps.zip

FooBarrior avatar May 14 '24 16:05 FooBarrior

What exact debugger commands are you running?

The simple r to the end of record gives a crash.

Just one note, that I am doing it with -s option, and connect a gdb client. Sorry for the confusion. I cannot test it without -s, something fails in the python script (with Remote connection closed), and a usual gdb session starts.

FooBarrior avatar May 14 '24 20:05 FooBarrior

You may also want to ckeck the output of ~/mariadb/bld/mysql-test/var/log/mysqld.1.err.

Nothing there on my machine. Just the expected

2024-05-14 23:21:36 0 [Note] /home/roc/mariadb/bld/sql/mariadbd: Shutdown complete

rocallahan avatar May 16 '24 10:05 rocallahan

b191809f8ba7491c08f1718c8b2a173814526f22 will log the pid of the task that sent the SIGSTOP, so that should help us figure out where that comes from.

rocallahan avatar May 16 '24 11:05 rocallahan

Another thing you can do is record with -F. That will force rr into emergency debug mode when the assertion trips. You can then grab the emergency-debug instructions from the log and run them to get into a gdb session at the point where the tracee is stuck, and you can get a backtrace and dump it here.

BTW when you send an rr dump, the -b -m -p options are helpful. Thanks!

rocallahan avatar May 16 '24 11:05 rocallahan

Hi, I was going to return to this when I had some time, and here it is.

After the latest update, everything works correctly. Any Idea what happened?

last version I was using was https://github.com/rr-debugger/rr/commit/b7c3913039592e2a7ce58b3d5c7ff37f0df5a6fb

FooBarrior avatar Jun 14 '24 13:06 FooBarrior

No idea. You could bisect the git revisions if you really want to know.

rocallahan avatar Jun 14 '24 21:06 rocallahan

I will close it then. Thank you for your support.

FooBarrior avatar Jun 14 '24 22:06 FooBarrior