rr icon indicating copy to clipboard operation
rr copied to clipboard

Divergence with tcmalloc on arm64

Open pcc opened this issue 1 year ago • 2 comments

I'm seeing the following divergence while replaying a tcmalloc-utilizing program on arm64:

[FATAL src/ReplaySession.cc:1226:check_ticks_consistency()]
 (task 2944657 (rec:2944634) at time 424)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SIGNAL: SIGSEGV(det)'; expected 10014507, got 10014509

I suspect this to be caused by accesses to CNTVCT_EL0 in the tcmalloc code. Unfortunately the kernel does not support trapping on count register access on arm64:

prctl(PR_SET_TSC, PR_TSC_SIGSEGV)       = -1 EINVAL (Invalid argument)

It would be possible for the kernel to configure the CPU to trap on this access by clearing CNTKCTL_EL1.EL0VCTEN.

pcc avatar Apr 26 '24 23:04 pcc

The kernel side of this is https://lore.kernel.org/all/[email protected]/T/

pcc avatar Apr 27 '24 05:04 pcc

(I also confirmed that it's CNTVCT_EL0 -- if I nop out the MRS instruction in the binary I can no longer reproduce the divergence.)

pcc avatar Apr 27 '24 06:04 pcc