ebpf-verifier icon indicating copy to clipboard operation
ebpf-verifier copied to clipboard

Verification failures are printed in label order, making investigating failures difficult

Open Alan-Jowett opened this issue 9 months ago • 1 comments

Failures in large eBPF programs with complex control flow can be hard to analyze. Of particular concern is that an assertion failure can result in invariants being removed from the post-invariant set, which causes a cascade of subsequent assertion failures which strip invariants.

Locating the initial invariant violation that triggers this cascade of failures can be challenging assuming the failures are printed in label order (as labels don't necessarily match the control flow).

One possible solution is to provide an option to print errors in WTO (weak topographical order), which will tend to place initial failures earlier in the list of failures.

A recent case in point is a verification failure in a Cilium program, where a single correlated branch failure triggers around 500 subsequent failures. Because of significant inlining and branch folding by LLVM, the initial failure was present in the middle of the list of 500 failures making it difficult to locate.

When sorting the errors in WTO, the failure is close to the top of the list.

Alan-Jowett avatar May 10 '24 15:05 Alan-Jowett

The flag --assume-assert is designed to help in this exact case. Doesn't it?

elazarg avatar May 11 '24 18:05 elazarg

@Alan-Jowett did you experiment with that flag?

elazarg avatar May 22 '24 16:05 elazarg

Thanks for the feedback. What does this flag do? Should it be on by default when performing the analysis? I will try it out.

Alan-Jowett avatar May 22 '24 16:05 Alan-Jowett

The flag forces the invariants to match the assertions, even if they did not. So errors are not repeated and not compounded.

It used to be the default. I don't remember the reasoning behind disabling it, but in part it's because expected failures in tests stress the code less (since many invariants drop to bottom immediately).

Please let me know if the flag helps.

elazarg avatar May 22 '24 16:05 elazarg

That flag is awesome!

From 507 errors -> 1 error, and it's the actual relevant error that is causing the problem. This is even better than my approach. Is there any harm in making this the default in eBPF-for-Window's calls to the verifier?

I will go ahead and close this PR unless you see value in it.

Alan-Jowett avatar May 22 '24 16:05 Alan-Jowett

I think it should be on by default during debugging, and off by default in tests.

elazarg avatar May 22 '24 16:05 elazarg

Thanks!

Alan-Jowett avatar May 22 '24 16:05 Alan-Jowett