ebpf icon indicating copy to clipboard operation
ebpf copied to clipboard

CI: tests crash with exit code 42

Open lmb opened this issue 3 years ago • 3 comments

We're having problems with our CI, where tests fairly often fail with exit code 42. That error is generated when the VM doesn't output anything and no success file is generated.

https://github.com/cilium/ebpf/blob/bf256fd5e8e6261dd9d189e1784d2224c3d7a237/run-tests.sh#L60-L62

This happens across all of the packages we test and across multiple major kernel versions. It doesn't always reproduce, but usually rebuilding a PR once or twice will trigger the problem at least once.

lmb avatar Mar 15 '22 11:03 lmb

I've been banging my head against this for a while, and finally made some progress. I enabled tracing of the kvm_run_exit event in qemu using -trace kvm_run_exit:

[email protected]:kvm_run_exit cpu_index 0, reason 2
[email protected]:kvm_run_exit cpu_index 0, reason 8

reason is given to us by the kernel, as the field exit_reason of struct kvm_run. 2 is KVM_EXIT_IO, 8 is KVM_EXIT_SHUTDOWN. The latter sounds innocuous, but actually is only generated in very rare circumstances: https://elixir.bootlin.com/linux/v5.16.14/A/ident/KVM_EXIT_SHUTDOWN It seems to triggered when the VM experiences a triple fault.

This means digging into what's happening in the kernel. Using perf to record all kvm tracepoints I managed to capture the following:

...
[001]   770.850155:                      kvm:kvm_entry: vcpu 0, rip 0x1000fe
[001]   770.850177:                       kvm:kvm_exit: vcpu 0 reason EXTERNAL_INTERRUPT rip 0x100107 info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x800000fb error_code 0x00000000
[001]   770.850207:                      kvm:kvm_entry: vcpu 0, rip 0x100107
[001]   770.850228:                       kvm:kvm_exit: vcpu 0 reason CR_ACCESS rip 0x100143 info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
[001]   770.850234:                         kvm:kvm_cr: cr_write 0 = 0x80000001
[001]   770.850287:                      kvm:kvm_entry: vcpu 0, rip 0x100146
[001]   770.850307:                       kvm:kvm_exit: vcpu 0 reason TRIPLE_FAULT rip 0x100146 info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
[001]   770.850313:                        kvm:kvm_fpu: unload
[001]   770.850316:             kvm:kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8)

lmb avatar Mar 15 '22 11:03 lmb

Sent an email to the KVM mailing list: https://lore.kernel.org/kvm/[email protected]/T/#u

lmb avatar Mar 16 '22 12:03 lmb

No takers on the mailing list unfortunately, I've decided to report with ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.13/+bug/1970034

lmb avatar Apr 23 '22 12:04 lmb

The workaround works, so I'm closing this.

lmb avatar Sep 01 '22 15:09 lmb

Hi,

What the workaround was? I am intrested, not getting my VM succesfully up and getting "CPU-27633 [000] 67926.303544: kvm_exit: reason CR_ACCESS rip 0xcf39 info 100 0"

juhoarvid avatar Sep 23 '22 06:09 juhoarvid

The workaround is to restart qemu a couple of times! Your error message looks different than ours though.

lmb avatar Sep 29 '22 08:09 lmb