rr
rr copied to clipboard
Divergence with tcmalloc on arm64
I'm seeing the following divergence while replaying a tcmalloc-utilizing program on arm64:
[FATAL src/ReplaySession.cc:1226:check_ticks_consistency()]
(task 2944657 (rec:2944634) at time 424)
-> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SIGNAL: SIGSEGV(det)'; expected 10014507, got 10014509
I suspect this to be caused by accesses to CNTVCT_EL0 in the tcmalloc code. Unfortunately the kernel does not support trapping on count register access on arm64:
prctl(PR_SET_TSC, PR_TSC_SIGSEGV) = -1 EINVAL (Invalid argument)
It would be possible for the kernel to configure the CPU to trap on this access by clearing CNTKCTL_EL1.EL0VCTEN.
The kernel side of this is https://lore.kernel.org/all/[email protected]/T/
(I also confirmed that it's CNTVCT_EL0 -- if I nop out the MRS instruction in the binary I can no longer reproduce the divergence.)