libpcap
libpcap copied to clipboard
protochain headache (please help)
I'm trying to use BPF in order to filter eth/vlan/ip/tcp headers that are behind a GRE encapsulation protocol.
Consider this example packet:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0000 7c 91 69 9b d3 6d 10 ef 49 5f b1 e7 08 00 45 00
0010 00 84 00 00 00 00 fd 2f a5 fd 6a e3 9c b5 6a e3
0020 a4 d1 00 00 65 58 CA FE CA FE CA FE CA CA CA CA
0030 CA CA 81 00 00 69 08 00 45 20 00 5a a1 c5 40 00
0040 79 06 77 02 c0 a8 03 60 c0 a8 64 05 c2 27 0d 3d
0050 49 ca eb 16 b7 2c 3e 30 50 18 03 fd f1 66 00 00
0060 17 03 03 00 2d 00 00 00 00 00 01 c2 2b 47 fa 65
0070 d6 eb 3c 72 e4 4c 87 cb 4a 86 7f 87 c7 9f 9d 3b
0080 df f6 93 3f 20 48 5b b6 fc db 61 31 03 95 41 3c
0090 28 f8
If we try something "simple", that only checks for the CA:CA:CA:CA:CA:CA
ethernet address, let's say: (protochain GRE && ether host CA:CA:CA:CA:CA:CA)
We'll get the following code:
dumpcap -d -f "(protochain GRE && ether host CA:CA:CA:CA:CA:CA)"
Capturing on 'wlo1'
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 22
(002) ldb [23]
(003) ldxb 4*([14]&0xf)
(004) jeq #0x2f jt 20 jf 5
(005) jeq #0x3b jt 20 jf 6
(006) add #0
(007) jeq #0x33 jt 8 jf 20
(008) txa
(009) ldb [x + 14]
(010) st M[1]
(011) txa
(012) add #1
(013) tax
(014) ldb [x + 14]
(015) add #2
(016) mul #4
(017) tax
(018) ld M[1]
(019) ja 4
(020) add #0
(021) jeq #0x2f jt 56 jf 22
(022) ldh [12]
(023) jeq #0x86dd jt 24 jf 65
(024) ldb [20]
(025) ldx #0x28
(026) jeq #0x2f jt 54 jf 27
(027) jeq #0x3b jt 54 jf 28
(028) jeq #0x0 jt 32 jf 29
(029) jeq #0x3c jt 32 jf 30
(030) jeq #0x2b jt 32 jf 31
(031) jeq #0x2c jt 32 jf 41
(032) ldb [x + 14]
(033) st M[1]
(034) ldb [x + 15]
(035) add #1
(036) mul #8
(037) add x
(038) tax
(039) ld M[1]
(040) ja 26
(041) jeq #0x33 jt 42 jf 54
(042) txa
(043) ldb [x + 14]
(044) st M[1]
(045) txa
(046) add #1
(047) tax
(048) ldb [x + 14]
(049) add #2
(050) mul #4
(051) tax
(052) ld M[1]
(053) ja 26
(054) add #0
(055) jeq #0x2f jt 56 jf 65
(056) ld [8]
(057) jeq #0xcacacaca jt 58 jf 60
(058) ldh [6]
(059) jeq #0xcaca jt 64 jf 60
(060) ld [2]
(061) jeq #0xcacacaca jt 62 jf 65
(062) ldh [0]
(063) jeq #0xcaca jt 64 jf 65
(064) ret #262144
(065) ret #0
This code, unless I'm missing something, will only execute in this order:
(000) ldh [12]
(001) jeq #0x800 jt 2 jf 22 # True!
(002) ldb [23] # 0x17
(003) ldxb 4*([14]&0xf)
(004) jeq #0x2f jt 20 jf 5 # True!
(020) add #0 # do nothing ??
(021) jeq #0x2f jt 56 jf 22 # True again!
(056) ld [8] # Ups, we are not behind GRE header!!
(057) jeq #0xcacacaca jt 58 jf 60 # False :-(
(060) ld [2] # Ups, we are not behind GRE header!!
(061) jeq #0xcacacaca jt 62 jf 65 # False again....
(065) ret #0 # missmatch!
Unless I have missunderstood the behaviour of protochain
, it should "virtually move" the packet pointer behind the GRE header, so we can filter there.
I'm really confused and everything I found on the internet says that filter (protochain GRE && ether host CA:CA:CA:CA:CA:CA)
should work.
This have been deeply tested on Fedora 37
and replicated on Ubuntu 22
-
libpcap: 1.10.1
-
Dumpcap (Wireshark) 3.6.8 (Git commit d25900c51508)
- Other data that might be interesting:
Running on Linux 5.19.15-301.fc37.x86_64, with Intel(R) Core(TM) i5-8250U CPU @
1.60GHz (with SSE4.2), with 32015 MB of physical memory, with GLib 2.74.0, with
zlib 1.2.12, with libpcap 1.10.1 (with TPACKET_V3), with LC_TYPE=C, binary
plugins supported (0 loaded).
Any help would be welcomed. Thank you in advance.
That's an interesting question. The packet with a convenience pcap savefile header prepended is base64-encoded below:
1MOyoQIABAAAAAAAAAAAAP//AAABAAAACgoKCgsLCwuSAAAAkgAAAHyRaZvTbRDvSV+x5wgARQAA
hAAAAAD9L6X9auOctWrjpNEAAGVYyv7K/sr+ysrKysrKgQAAaQgARSAAWqHFQAB5BncCwKgDYMCo
ZAXCJw09ScrrFrcsPjBQGAP98WYAABcDAwAtAAAAAAABwitH+mXW6zxy5EyHy0qGf4fHn5073/aT
PyBIW7b822ExA5VBPCj4
Using Wireshark ([Protocols in frame: eth:ethertype:ip:gre:eth:ethertype:vlan:ethertype:ip:tcp:tls]
), it is easy to notice that both ip proto gre
and ip protochain gre
match the packet, but ether host ca:ca:ca:ca:ca:ca
does not (which is expected). Also ip protochain \tcp
does not match the packet, which indicates that ip protochain
does not chase the protocol header stack (which would be understandably difficult or impossible using BPF) and consequently would not be able to advance the packet data offset for the ether host
predicate by an unknown amount.
Please note that protochain P
is a shorthand for ip protochain P or ip6 protochain P
, so more than a half of the BPF code above is no-op on this packet. However, even the IPv4-only version visualized using BPF Exam does not immediately remind me of anything particular. I agree that it would be useful to reconstruct the meaning of ip protochain
and to update the man page with more specifics. The add 0
instruction likely indicates that the code has some space for improvement.
Thanks for your quick reply @infrastation .
It's bad to hear that you cannot "chain" protocols (such as VXLAN/GRE, which is common on a SDN/ERSPAN in today's monitoring). Correct me if I have misunderstood you.
By reading the "BPF assembly", I don't think it may be too hard to "chain" protocols. It should be as "easy" as keep any loads as ldxb
where x
is the offset
of whatever you have been parsed. By adding some context using something like parenthesis it would be possible to do something like:
host <erspan source> && (chain GRE && vlan && host <interesting IP>)
or even simpler for most SDN env:
chain VXLAN && host <interesting IP>
I mean, vlan
keyword actually "moves" (not very clean in my opinion) the packet pointer from that point, so I don't find any strong reason to avoid adding more keywords that moves the packet pointer.
I'm not familiar with the libpcap insights and probably this suggestion might not be quick/simply to implement. But if someone thinks about it, they would be quick realize this functionality is more and more required as long as encapsulating traffic is more and more popular everywhere.
Thanks for your support.
The BPF does the filtering, and anything is possible with the right BPF compiled in. Decoding is a different kettle of fish, and that's done in the tcpdump code, not in the libpcap code.
Thanks for your reply @mcr,
But that's exactly what I'm proposing: using BPF to filter tunneled headers. If you wanna use a network probe with ERSPAN or a simple local-mirror under a VXLAN-SDN, is a common thing to want to filter the "real" communication instead of (just) the transport one.
From my point of view, chaining protocols is a must in order to use BPF efficiently in the new SDN era. (I don't like SDN btw, but it is what it is)
Hardcoding ether[n]==CA && ether[n+1]==CA && ...
is not an usable way of filtering MACs... (or ips, or whatever... specially with non-constant size protocols in the middle)
The add #0
statement is an intentional no-op in gencode.c:gen_protochain()
; from the function code it seems that in IPv4 case it would skip an AH header and do something else that I do not understand yet.