libpcap icon indicating copy to clipboard operation
libpcap copied to clipboard

Simple filter of 1000 hosts takes O(n²) to compile

Open ole-tange opened this issue 1 year ago • 3 comments

When giving tcpdump a filter of 1000 hosts to ignore, the compilation of the filter takes at least O(n²) time and RAM.

I would have expected it to take O(n) or worst case O(n log n) for such a simple filter.

The memory usage is also drastic in the compilation phase (8GB for 3000 hosts, 1 GB for 1000 hosts).

After compilation of the filter both CPU usage and memory usage are as expected.

Build

$ git clone https://github.com/the-tcpdump-group/libpcap
$ cd libpcap
$ ./autogen.sh 
$ ./configure 
$ make -j5
$ sudo make install
$ cd ..

$ git clone https://github.com/the-tcpdump-group/tcpdump
$ cd tcpdump
$ ./autogen.sh 
$ ./configure 
$ make -j5
$ sudo make install
$ cd ..

Versions

$ tcpdump  --version
tcpdump version 5.0.0-PRE-GIT
libpcap version 1.11.0-PRE-GIT (with TPACKET_V3)
OpenSSL 3.0.2 15 Mar 2022
$ uname -a
Linux travel 6.5.0-14-lowlatency #14.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Nov 22 16:24:11 UTC x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MCVE

$ filter() {
    # $1 = How many entries?                                                             
    perl -e '                                                                            
        $pre = "not (";                                                                  
        $post = ")";                                                                     
        $join = " or ";                                                                 
        sub hostport {                                                                   
          $host = sprintf "%d.%d.%d.%d", rand()*255,rand()*255,rand()*255,rand()*255;    
          $port = sprintf "%d", rand()*65535;                                            
          return "(host $host and port $port)";                                          
        }                                                                                
        print $pre, join($join,map { hostport() } 1..shift), $post;                      
    ' $1
}
$ export -f filter
$ sudo time -v tcpdump -c 1 `filter 300`
# Around 1s 100M
$ sudo time -v tcpdump -c 1 `filter 1000`
# Around 15s 1000M
$ sudo time -v tcpdump -c 1 `filter 3000`
# Around 160s 8000M

Measure the time and memory to compile the filter for 300, 1000, 3000 hosts.

ole-tange avatar Jan 20 '24 01:01 ole-tange

Reproduces as described. Consumption of RAM seems to be specific to the BPF optimizer (on by default).

infrastation avatar Jan 20 '24 12:01 infrastation

You are certainly right that it could be better: send code!
eBPF would let you put the list of hosts into some kind of map, but we don't have that in userspace, or assume it exists.

mcr avatar Jan 21 '24 20:01 mcr

Reproduces as described. Consumption of RAM seems to be specific to the BPF optimizer (on by default).

Disabling the optimizer (-O) does not solve the issue. See graph 2 on https://unix.stackexchange.com/questions/767162/tcpdump-takes-on%C2%B2-time-to-parse-filter

ole-tange avatar Jan 25 '24 03:01 ole-tange