gobpf elf: perf event ordering has bugs

The algorithm to re-order perf events coming from several cpus has some bugs:

the loop condition on incoming events is incorrect because the variable incoming is modified inside the loop
the userspace clock clock_gettime(CLOCK_MONOTONIC) seems to give different results than the kernel clock bpf_ktime_get_ns. We should check other clocks like CLOCK_MONOTONIC_RAW. Or use a different algorithm that does not need to use the userspace clock
despite the comment in bpf_ktime_get_ns saying it is a monotonic clock, another comment says it is not guaranteed to be monotonic. I am not sure which one is true. If that's the case, we'll need a kernel fix since there is only one bpf helper function to get a timestamp.

I can reproduce the issue with incorrect ordering with this test. I attach a eBPF kprobe on the uname system call. A shell script to run uname in a loop was not fast enough to reproduce the problem, so I needed to write a small program in Golang running syscall.Uname as fast as possible, then execute it in parallel in 2 different cpus:

taskset --cpu-list 0 uname-loop
taskset --cpu-list 1 uname-loop

I am working on a patch. I want to avoid using the userspace clock clock_gettime at all.

/cc @iaguis @krnowak @2opremio

May 05 '17 09:05 alban

Discussion about clock bpf_ktime_get_ns and clock_gettime: https://github.com/iovisor/bcc/issues/931 Summary:

bpf_ktime_get_ns should use the same clock as clock_gettime(CLOCK_MONOTONIC)
but the clock might be incorrect due to a kernel bug; now fixed (see details about kernel versions in Ubuntu: https://github.com/weaveworks/scope/issues/2334)

May 05 '17 10:05 alban

@alban what's left here? Could this be the cause of https://github.com/weaveworks/scope/issues/2650 ?

Jul 11 '17 16:07 2opremio

We continue to observe events arriving out of order from time to time.

Would it be an idea to adjust beforeHarvest back in time a few milliseconds: we would be reporting events a little late but less likely to hit whatever race causes them to come out of order?

Jul 04 '19 14:07 bboreham

gobpf gobpf copied to clipboard

elf: perf event ordering has bugs

gobpf
gobpf copied to clipboard