gobpf icon indicating copy to clipboard operation
gobpf copied to clipboard

elf: perf event ordering has bugs

Open alban opened this issue 8 years ago • 3 comments

The algorithm to re-order perf events coming from several cpus has some bugs:

I can reproduce the issue with incorrect ordering with this test. I attach a eBPF kprobe on the uname system call. A shell script to run uname in a loop was not fast enough to reproduce the problem, so I needed to write a small program in Golang running syscall.Uname as fast as possible, then execute it in parallel in 2 different cpus:

  • taskset --cpu-list 0 uname-loop
  • taskset --cpu-list 1 uname-loop

I am working on a patch. I want to avoid using the userspace clock clock_gettime at all.

/cc @iaguis @krnowak @2opremio

alban avatar May 05 '17 09:05 alban

Discussion about clock bpf_ktime_get_ns and clock_gettime: https://github.com/iovisor/bcc/issues/931 Summary:

  • bpf_ktime_get_ns should use the same clock as clock_gettime(CLOCK_MONOTONIC)
  • but the clock might be incorrect due to a kernel bug; now fixed (see details about kernel versions in Ubuntu: https://github.com/weaveworks/scope/issues/2334)

alban avatar May 05 '17 10:05 alban

@alban what's left here? Could this be the cause of https://github.com/weaveworks/scope/issues/2650 ?

2opremio avatar Jul 11 '17 16:07 2opremio

We continue to observe events arriving out of order from time to time.

Would it be an idea to adjust beforeHarvest back in time a few milliseconds: we would be reporting events a little late but less likely to hit whatever race causes them to come out of order?

bboreham avatar Jul 04 '19 14:07 bboreham