gobpf
gobpf copied to clipboard
elf: perf event ordering has bugs
The algorithm to re-order perf events coming from several cpus has some bugs:
- the loop condition on incoming events is incorrect because the variable
incomingis modified inside the loop - the userspace clock clock_gettime(CLOCK_MONOTONIC) seems to give different results than the kernel clock bpf_ktime_get_ns. We should check other clocks like CLOCK_MONOTONIC_RAW. Or use a different algorithm that does not need to use the userspace clock
- despite the comment in bpf_ktime_get_ns saying it is a monotonic clock, another comment says it is not guaranteed to be monotonic. I am not sure which one is true. If that's the case, we'll need a kernel fix since there is only one bpf helper function to get a timestamp.
I can reproduce the issue with incorrect ordering with this test. I attach a eBPF kprobe on the uname system call. A shell script to run uname in a loop was not fast enough to reproduce the problem, so I needed to write a small program in Golang running syscall.Uname as fast as possible, then execute it in parallel in 2 different cpus:
taskset --cpu-list 0 uname-looptaskset --cpu-list 1 uname-loop
I am working on a patch. I want to avoid using the userspace clock clock_gettime at all.
/cc @iaguis @krnowak @2opremio
Discussion about clock bpf_ktime_get_ns and clock_gettime: https://github.com/iovisor/bcc/issues/931 Summary:
- bpf_ktime_get_ns should use the same clock as clock_gettime(CLOCK_MONOTONIC)
- but the clock might be incorrect due to a kernel bug; now fixed (see details about kernel versions in Ubuntu: https://github.com/weaveworks/scope/issues/2334)
@alban what's left here? Could this be the cause of https://github.com/weaveworks/scope/issues/2650 ?
We continue to observe events arriving out of order from time to time.
Would it be an idea to adjust beforeHarvest back in time a few milliseconds: we would be reporting events a little late but less likely to hit whatever race causes them to come out of order?