pwru icon indicating copy to clipboard operation
pwru copied to clipboard

ebpf filter issue with `ip6 protochain 58`

Open mss opened this issue 8 months ago • 11 comments

I am trying to use this tool to debug some NDP issue. When I use the command pwru ip6 protochain 58 I get this error:

2025/04/30 09:51:53 Failed to inject filter ebpf for kprobe_skb_4: unable to compute blocks: instruction 18: ja 4294967281 flows past last instruction

Just calling pwru ip6 proto 58 works as intended (side issue: icmp6 should be an alias for that one but results in a syntax error).

This is a Proxmox server running 6.8.12-9-pve.

mss avatar Apr 30 '25 10:04 mss

Thanks for the issue.

Hit the same on the 6.11.0-25-generic kernel. Looks like using protochain hits the BPF instruction limit.

Is it possible to limit how "deep" the protochain should parse?

brb avatar May 20 '25 07:05 brb

I don't think that's possible since theoretically it could be arbitrarily deep. The manpage says:

The BPF code emitted by this primitive is complex and cannot be optimized by the BPF optimizer code, and is not supported by filter engines in the kernel, so this can be somewhat slow, and may cause more packets to be dropped.

Maybe the best solution (for now and if possible) would be to just disallow protochain with some kind of useful error?

mss avatar May 20 '25 08:05 mss

Could you add the limitation to https://github.com/cilium/pwru/blob/main/KNOWN_ISSUES.md?

brb avatar May 20 '25 10:05 brb

This looks like a libpcap bug to me.

Libpcap compiles ip6 protochain 58 into:

$ tcpdump -dd ip6 protochain 58 | cat -n
Warning: assuming Ethernet
     1	{ 0x28, 0, 0, 0x0000000c },
     2	{ 0x15, 0, 33, 0x000086dd },
     3	{ 0x30, 0, 0, 0x00000014 },
     4	{ 0x1, 0, 0, 0x00000028 },
     5	{ 0x15, 27, 0, 0x0000003a },
     6	{ 0x15, 26, 0, 0x0000003b },
     7	{ 0x15, 3, 0, 0x00000000 },
     8	{ 0x15, 2, 0, 0x0000003c },
     9	{ 0x15, 1, 0, 0x0000002b },
    10	{ 0x15, 0, 9, 0x0000002c },
    11	{ 0x50, 0, 0, 0x0000000e },
    12	{ 0x2, 0, 0, 0x00000000 },
    13	{ 0x50, 0, 0, 0x0000000f },
    14	{ 0x4, 0, 0, 0x00000001 },
    15	{ 0x24, 0, 0, 0x00000008 },
    16	{ 0xc, 0, 0, 0x00000000 },
    17	{ 0x7, 0, 0, 0x00000000 },
    18	{ 0x60, 0, 0, 0x00000000 },
    19	{ 0x5, 0, 0, 0xfffffff1 },
    20	{ 0x15, 0, 12, 0x00000033 },
    21	{ 0x87, 0, 0, 0x00000000 },
    22	{ 0x50, 0, 0, 0x0000000e },
    23	{ 0x2, 0, 0, 0x00000000 },
    24	{ 0x87, 0, 0, 0x00000000 },
    25	{ 0x4, 0, 0, 0x00000001 },
    26	{ 0x7, 0, 0, 0x00000000 },
    27	{ 0x50, 0, 0, 0x0000000e },
    28	{ 0x4, 0, 0, 0x00000002 },
    29	{ 0x24, 0, 0, 0x00000004 },
    30	{ 0x7, 0, 0, 0x00000000 },
    31	{ 0x60, 0, 0, 0x00000000 },
    32	{ 0x5, 0, 0, 0xffffffe4 },
    33	{ 0x4, 0, 0, 0x00000000 },
    34	{ 0x15, 0, 1, 0x0000003a },
    35	{ 0x6, 0, 0, 0x00040000 },
    36	{ 0x6, 0, 0, 0x00000000 },

Look at line 19 { 0x5, 0, 0, 0xfffffff1 } which can be interpreted as ja 0xfffffff1 but it makes no sense. That's exactly where we saw the error unable to compute blocks: instruction 18: ja 4294967281 flows past last instruction (0xfffffff1 == 4294967281), cloudflare.cbpfc can't convert this cbpf insns into ebpf, and me neither.

Having searched issues from libpcap, this seems to be relevant: https://github.com/the-tcpdump-group/libpcap/issues/1133. My understanding is current libpcap can't generate correct cbpf insns for protochain.

jschwinger233 avatar May 20 '25 12:05 jschwinger233

@jschwinger233 Good find (as always 😅)! @mss Are you OK to document in the KNOWN_LIMITATIONS.md?

brb avatar May 22 '25 07:05 brb

@brb yes, totally. Some documentation or improved error handling was why I raised this issue so the next person running into this will hopefully have a less puzzling day.

Just a suggestion: Maybe the command line parser or whatever feeds the arguments to the bpf compiler part could also bail out if it encounters the string protochain as an argument?

mss avatar May 22 '25 07:05 mss

Thanks!

Maybe the command line parser or whatever feeds the arguments to the bpf compiler part could also bail out if it encounters the string protochain as an argument?

:+1:

brb avatar May 22 '25 07:05 brb

If the code in question uses pcap_compile() and pcap_setfilter(), these two functions aim to produce a filter bytecode that will run in the kernel if possible (regardless of the OS). If the kernel rejects the filter (as has been the case for protochain for a very long time), libpcap will resort to userland filtering. If libpcap has not resorted to userland filtering and the code uses pcap_setfilter(), this is a libpcap bug. In this case please describe the minimal steps to reproduce the bug. If the code uses pcap_compile() but not pcap_setfilter() (supposedly some setsockopt() or ioctl()?), then it is the code's responsibility to handle an error returned by the kernel in whatever way is the most appropriate for the use case. If in this use case it is not possible to resort to userland filtering, a possible workaround could be disabling the protochain primitive:

./configure --disable-protochain

On a related note, it would be a better feedback loop if pwru detected and produced a failure on the command line.

infrastation avatar May 23 '25 13:05 infrastation

Thank you, @infrastation. I believe it was my prematurely opened (and closed 😢) false bug report that brought you here.

At this point, I think the issue lies in the cBPF-to-eBPF translation library — see cloudflare/cbpfc#37. It's clear that libpcap generates correct cBPF for ip6 protochain 58 👍

Your suggestion to disable protochain for now sounds great — thank you!

jschwinger233 avatar May 23 '25 14:05 jschwinger233

You are welcome. It would be best not to set this workaround in stone: if libpcap starts to implement protochain without using a backwards jump (which is what most if not all kernel BPF implementations reject), --disable-protochain (and this problem) will disappear.

infrastation avatar May 23 '25 14:05 infrastation

libpcap generates correct cBPF

"Correct" as in "correct for a cBPF implementation that allows backward jumps"; the original one didn't allow them, in order to prevent stuffing an infinitely looping cBPF program into the kernel.

A verifier might be able to check whether each pass through cBPF loop always looks at a location further into the packet data, so that eventually it will either find what it's looking for or go past the end of the packet and terminate with a return value of 0 due to the implied bounds checking done by cBPF loads, in which case allowing loops in the kernel might be OK (although a loop that checks one byte at a time, skipping forward by one byte, might take a while to terminate).

guyharris avatar Jul 29 '25 23:07 guyharris