libpcap icon indicating copy to clipboard operation
libpcap copied to clipboard

libpcap: added raw filters

Open chemag opened this issue 8 years ago • 16 comments

The goal of supporting raw filters* is to provide libpcap/tcpdump support for generic BPF insns, including those that are not-supported by libpcap (e.g., the BPF_MOD/BPF_XOR ops in Linux, or any of the multiple ancillary loads in linux). It also allows testing new kernel extensions to the BPF ISA without having to modify libpcap/tcpdump.

We provide support by modifying pcap_compile() so that it first checks for raw filters. This works for expressions appended in the command line, and for expressions read from a file ("-F" option). Filters starting with an integer and a valid separator (',' or '\n') are considered raw. All other filters are considered (traditional) expressions.

We also make sure that filters compiled from raw filters are left for the kernel to validate (added "skip_validate" to pcap_t).

*Raw filters are those generated by tcpdump -ddd. i.e.

$ ./tcpdump -ddd -i eth0 icmp 6 40 0 0 12 21 0 3 2048 48 0 0 23 21 0 1 1 6 0 0 65535 6 0 0 0

We also support replacing the new line characters with commas, as to make possible to have inline filters.

$ ./tcpdump -ddd -i eth0 icmp |tr '\n' ',' 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0,

Some examples: $ ./tcpdump -nn -i eth0 "6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0," $ ./tcpdump -nn -i eth0 -F ~/bpf/icmp.2.bpfraw

chemag avatar Feb 28 '17 19:02 chemag

That's a bit of a big Git comment. Should we discard everything starting with "*Raw filters are those generated by tcpdump -ddd. i.e.", and change the first paragraph to read

The goal of supporting raw filters (those generated by tcpdump -ddd) ...

guyharris avatar Feb 28 '17 19:02 guyharris

I like large comments (it's a doc, after all), but please feel free to remove what you decide

chemag avatar Feb 28 '17 19:02 chemag

Is there any reason to run raw filters through the optimizer? Presumably if a user specifies a raw filter, they want exactly that chunk of BPF machine code to be used.

guyharris avatar Feb 28 '17 19:02 guyharris

Documentation for the user would belong in a man page; documentation for libpcap developers would belong in a comment in the code.

guyharris avatar Feb 28 '17 19:02 guyharris

Documentation for the user would belong in a man page; documentation for libpcap developers would belong in a comment in the code.

I removed all the test notes.

chemag avatar Feb 28 '17 19:02 chemag

Is there any reason to run raw filters through the optimizer? Presumably if a user specifies a raw filter, they want exactly that chunk of BPF machine code to be used.

AFAICT, we don't run the optimizer when we call pcap_compile_raw(). In that case, pcap_compile() returns before calling the optimizer

chemag avatar Feb 28 '17 19:02 chemag

Please note this pull request failed to build on MacOS.

infrastation avatar Mar 01 '17 15:03 infrastation

Also used 4096 instead of 64k (that's what the kernel defines).

Actually, the kernel defines it as 512, as do the kernel and the kernel and the kernel and the kernel. The kernel also appears to do so, although Oracle's version may have diverged from the last OpenSolaris version.

Unfortunately, the kernel isn't open-source, so I can't post a URL. The kernel-mode driver also defines it as 512.

Oh, and the kernel doesn't have BPF, so it doesn't define it as anything.

Translation: there's no such thing as "the kernel" in the context of libpcap; there are a number of kernels it deals with - that's the whole point of libpcap; it hides, as best it can, the variety of packet capture mechanisms, so that code can be written to run on several different OSes with a minimum of platform-specific #ifdefs.

Note also that any one of those kernels might change the value in the future.

So my inclination is not to pay attention to what any particular kernel happens to choose, and not to make an effort to try to find the appropriate header on various different platforms. Just pick an arbitrary maximum, and give it a name other than BPF_MAXINSNS, to emphasize that it's a libpcap limit rather than any particular OS's limit.

guyharris avatar Mar 01 '17 21:03 guyharris

Actually, the kernel defines it as 512, as do the kernel and the kernel and the kernel and the kernel ...

My bad. There's only one kernel for me :)

So my inclination is not to pay attention to what any particular kernel happens to choose, and not to make an effort to try to find the appropriate header on various different platforms. Just pick an arbitrary maximum, and give it a name other than BPF_MAXINSNS, to emphasize that it's a libpcap limit rather than any particular OS's limit.

Done

chemag avatar Mar 01 '17 22:03 chemag

Ping

chemag avatar Mar 03 '17 20:03 chemag

Ping

chemag avatar Mar 10 '17 19:03 chemag

One more ping...

chemag avatar Mar 28 '17 02:03 chemag

if there is still interest, please rebase, thank you!

mcr avatar Apr 26 '19 14:04 mcr

if there is still interest, please rebase, thank you!

Done.

Tested again, and added a "Tested:" section to the patch comment.

Thanks!

chemag avatar Apr 29 '19 18:04 chemag

Also, per a previous comment from Guy, I moved a chunk of the comment to the pcap_compile man page

chemag avatar Apr 29 '19 19:04 chemag

On one hand, merging these changes into libpcap would improve consistency in tcpdump: since it can print the compiled bytecode with -ddd, it would be reasonable if it could parse and use this compiled bytecode (as iptables -I INPUT -m bpf --bytecode does), and for that libpcap (but not the programs that use libpcap, as Guy explained) would have to recognize it.

On the other hand, the "ddd" format has a pitfall, in that it does not specify for which DLT the expression was compiled. This, for example, creates the space for things quietly going wrong when the end user compiles a filter using the popular DLT_EN10MB type, whereas in the iptables context above the bytecode is applied to what effectively is DLT_RAW.

So perhaps this change and cBPF savefile address related, but distinct use cases, and the only coordination between the two should be that the complete API in the end makes as much sense as possible.

infrastation avatar Sep 19 '22 21:09 infrastation