libpcap icon indicating copy to clipboard operation
libpcap copied to clipboard

short format IPv4 in /etc/networks has a number of problems

Open mabra opened this issue 9 months ago • 34 comments

tcpdump fails to recognize or operate on names in /etc/networks, example line:

lan2   192.168.12.0

tcpdump -i any "net lan" does not give any output. Using the address works. Note: Using inotify I saw open/read/close on the file. OS: debian 11

$ tcpdump --version
tcpdump version 4.99.0
libpcap version 1.10.0 (with TPACKET_V3)
OpenSSL 1.1.1w  11 Sep 2023

Is there something to fix it? Thanks.

mabra avatar Feb 15 '25 13:02 mabra

At a glance, the commands, if run exactly as shown, ought to result in:

tcpdump: unknown network 'lan'

So either you run tcpdump -i lan2 or lan has another entry in /etc/networks. Let's suppose that is a typo and see what filter program this produces:

$ tcpdump -y LINUX_SLL -d net lan2
(000) ldh      [14]
(001) jeq      #0x800           jt 2	jf 6
(002) ld       [28]
(003) jeq      #0xc0a80c00      jt 12	jf 4
(004) ld       [32]
(005) jeq      #0xc0a80c00      jt 12	jf 13
(006) jeq      #0x806           jt 8	jf 7
(007) jeq      #0x8035          jt 8	jf 13
(008) ld       [30]
(009) jeq      #0xc0a80c00      jt 12	jf 10
(010) ld       [40]
(011) jeq      #0xc0a80c00      jt 12	jf 13
(012) ret      #262144
(013) ret      #0

This is a bit messy because it has to account for more than one EtherType. Let's look at the IPv4 only:

$ tcpdump -y IPV4 -d net lan2
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0xc0a80c00      jt 5	jf 3
(003) ld       [16]
(004) jeq      #0xc0a80c00      jt 5	jf 6
(005) ret      #262144
(006) ret      #0

tcpdump -y IPV4 -d net 192.168.12.0
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0xc0a80c00      jt 5	jf 3
(003) ld       [16]
(004) jeq      #0xc0a80c00      jt 5	jf 6
(005) ret      #262144
(006) ret      #0

$tcpdump -y IPV4 -d host 192.168.12.0
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0xc0a80c00      jt 5	jf 3
(003) ld       [16]
(004) jeq      #0xc0a80c00      jt 5	jf 6
(005) ret      #262144
(006) ret      #0

This way, /etc/networks is not a factor, but it is clear that net 192.168.12.0 and host 192.168.12.0 produce the same result, which look a bit odd. Let me check if this is the expected behaviour.

infrastation avatar Feb 15 '25 14:02 infrastation

The meaning of net 192.168.12.0 is the same of host 192.168.12.0, which is the documented behaviour:

An IPv4 network number can be written as a dotted quad (e.g., 192.168.1.0), dotted triple (e.g., 192.168.1), dotted pair (e.g, 172.16), or single number (e.g., 10); the netmask is 255.255.255.255 for a dotted quad (which means that it's really a host match), 255.255.255.0 for a dotted triple, 255.255.0.0 for a dotted pair, or 255.0.0.0 for a single number.

This way, without /etc/networks giving the first three octets of the network applies the expected /24 netmask:

$ tcpdump -y IPV4 -d src net 192.168.12
(000) ld       #0x0
(001) ld       [12]
(002) and      #0xffffff00
(003) jeq      #0xc0a80c00      jt 4	jf 5
(004) ret      #262144
(005) ret      #0

However, using the notation above in Linux /etc/networks does not have the same effect and the filter program still uses a /32 netmask:

$ grep lan12 /etc/networks
lan12 192.168.12

$ ./tcpdump -y IPV4 -d src net lan12
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0xc0a80c00      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

In the former (/24) case the code path is through gen_ncode(), which converts the address using pcapint_atoin(). In the latter (/32) case the code path is through gen_scode(), which converts the address using pcap_nametonetaddr(), which means either the two functions are not coupled properly to handle the short network syntax, or the Linux implementation of getnetbyname_r() returns not exactly what pcap_nametonetaddr() expects to receive. This requires a bit more investigation because pcap_nametonetaddr() uses three different implementations of getnetbyname_r().

Meanwhile the workaround would be to use tcpdump -i any 'net 192.168.12.0/24', which works exactly the way it looks (there is at least one other known inconsistency in the short address syntax processing).

After the bug fix is in place, it should be practicable to add tests for it in order to detect such issues automatically.

infrastation avatar Feb 15 '25 15:02 infrastation

pcap_nametonetaddr() uses three different implementations of getnetbyname_r()

That's because there's no standard for getnetbyname_r() - it's not in POSIX, as of the latest issue, Issue 8 - so GNU Libc, Sun/Oracle, and IBM have produced versions with three different APIs.

I wrote a small program to test getnetbyname(), and ran it on FreeBSD 14.1 with an /etc/networks file containing

subnet1	127.0.1	alias1	# comment 1

The program just printed the n_net member in hex; what it reported was that - subnet1 was 0x007f0001.

When I ran it on Ubuntu 24.04, with an /etc/networks file containing

subnet	127.0.1

the program reported that subnet was 0x7f000100.

What POSIX says about n_net is that it's "The network number, in host byte order." That's what the Ubuntu man page said as well. The FreeBSD man page says that it's "The network number. Network numbers are returned in machine byte order.", i.e., the same thing, but with more words.

That's network number, so getnetbyname() is behaving as documented on FreeBSD and NOT behaving as documented on Ubuntu 24.04 (and probably on any Linux using GNU libc).

For what it's worth, inet_network() appears to work the same on macOS and Linux - it converts "127.0.1" to 0x007f0001 - so it looks as if the Linux getnetbyname() is either not using inet_network() to convert dotted-whatever network numbers or is "fixing" it. I'm not sure whether the "files" NSS module is, so I'm not sure where the code that parses /etc/networks is. I'm just surprised there's nothing obvious on the Intertubes abut this behavior.

guyharris avatar Feb 15 '25 19:02 guyharris

Thank you for testing. Do you mean macOS and FreeBSD work the same and both return 0x007f0001?

infrastation avatar Feb 15 '25 22:02 infrastation

In GNU libc nss_files/files-network.c says:

   /* 'inet_network' does not add zeroes at the end if the network number
      does not four byte values.  We add them ourselves if necessary.  */

In musl libc src/network/netname.c does not have this specific problem:

struct netent *getnetbyname(const char *name)
{
        return 0;
}

infrastation avatar Feb 15 '25 22:02 infrastation

Thank you for testing. Do you mean macOS and FreeBSD work the same and both return 0x007f0001?

Unfortunately, they don't. It appears that macOS 13.6.9, at least, seems to treat the first octet of a dotted-whatever as the network number, e.g.

subnet		192.0.1

reports 0x000000c0.

guyharris avatar Feb 15 '25 22:02 guyharris

Old-style network classes may be a factor.

infrastation avatar Feb 15 '25 22:02 infrastation

In GNU libc nss_files/files-network.c says:

Oh, that's where they hid the "files" module's getnetent code!

guyharris avatar Feb 15 '25 23:02 guyharris

Old-style network classes may be a factor.

They may be, but that's not an excuse for Darwin - 192.0.1.x would be a class C network, so the network number would be 0x00c00001, not 0x000000c0. Time to dig into libinfo source, I guess.

guyharris avatar Feb 15 '25 23:02 guyharris

Time to dig into libinfo source, I guess.

Darwin uses atoi() on the second token in /etc/networks lines. This is a bug, given that the networks(5) man page on macOS says:

 Network number may be specified in the conventional ``.''  (dot) notation
 using the inet_network(3) routine from the Internet address manipulation
 library, inet(3).  Network names may contain any printable character
 other than a field delimiter, newline, or comment character.

I shall file a "feedback" with Apple and hope that it turns into a Radar.

guyharris avatar Feb 16 '25 01:02 guyharris

Feedback submitted.

guyharris avatar Feb 16 '25 19:02 guyharris

So, at least at present:

  • using an /etc/networks network name with "net" on Linux with GNU libc produces a bogus filter;
  • using an /etc/networks network name with "net" on macOS produces a different type of bogus filter;
  • using an /etc/networks network name with "net" on Linux with musl produces an error.

Not exactly great behavior; perhaps this should be documented.

guyharris avatar Feb 16 '25 19:02 guyharris

Yes, documenting this would be a good first step.

infrastation avatar Feb 17 '25 00:02 infrastation

Another useful starting point could be identifying OSes that do it right, at least in some cases if not always, and adding tests for those patches of correct behaviour.

infrastation avatar Feb 19 '25 00:02 infrastation

Another useful starting point could be identifying OSes that do it right, at least in some cases if not always, and adding tests for those patches of correct behaviour.

At this point, my guess would be that:

  • all the *BSDs do it correctly, unless they broke it when adding support for the Name Service Switch (NSS), as they probably started from the 4.xBSD code;
  • AIX and Solaris probably also do it correctly, with the same qualification, for much the same reason.

Windows doesn't even have getnetbyname(). macOS has the bug already mentioned (they broke it either when converting it to the old-style files+Directory Service stuff or the new-style files+Directory Service, unless NeXT broke it and passed that on to Apple when Apple bought NeXT). Linux distributions have a choice of at least two differently-broken versions (GNU libc and musl).

guyharris avatar Feb 19 '25 01:02 guyharris

Also, if the "network number" of a network is defined to be "if you take the IPv4 address of any host on the network, AND it with the net mask, and then shift it right by as many low-order zero bits are in the net mask", then:

  • a 3-component /etc/networks entry is a netmask for a class C network;
  • a 2-component /etc/networks entry is a netmask for a class B network;
  • a 1-component /etc/networks entry is a netmask for a class A network;
  • there is no provision for any CIDR network with a net mask other than 0xff000000, 0xffff0000, or 0xffffff00;

in which case the usefulness of /etc/networks is not as great as it was back before CIDR.

guyharris avatar Feb 19 '25 01:02 guyharris

This definition matches networks(5) on Linux, as well as the idea I used to have about the format. The actual behaviour needs a bit more mapping.

infrastation avatar Feb 19 '25 02:02 infrastation

Tested on:

  • FreeBSD 14.1-RELEASE - works as expected;
  • DragonFly 6.4-RELEASE - works as expected;
  • NetBSD 9.4 - works as expected;
  • OpenBSD 7.5 - doesn't use /etc/networks, uses /etc/hosts, and you need to have all four components, so 192.168.1, for example, doesn't work, and if you do 192.168.1.0, it gives you 0xc0a80100, and, no, it doesn't support CIDR /''n'' notation;
  • Solaris 11.4 - works as expected;
  • Haiku R1 beta 4(?) - for 192.16.1, the program gives you 0xc0a80001, which is Yet Another Ridiculous Failure Mode.

guyharris avatar Feb 19 '25 05:02 guyharris

So it's beginning to look as if we should strongly recommend against using network names for the "net" keyword, and maybe even deprecate support for that and eventually remove it.

guyharris avatar Feb 19 '25 05:02 guyharris

For what it's worth, I've submitted https://sourceware.org/bugzilla/show_bug.cgi?id=32719 for GNU libc.

guyharris avatar Feb 19 '25 05:02 guyharris

Speaking of "get rid of getnet*" - https://www.mail-archive.com/[email protected]/msg04250.html, from May 2021.

guyharris avatar Feb 19 '25 05:02 guyharris

And DEC/Compaq/whoever marked it as obsolete in the Tru64 UNIX documentation a while ago - https://www3.physnet.uni-hamburg.de/physnet/Tru64-Unix/HTML/MAN/MAN3/0979____.HTM.

guyharris avatar Feb 19 '25 05:02 guyharris

Thank you for the analysis. I agree it looks most useful at least to make net NAME an off-by-default syntax in libpcap. Only the four-part dotted-quad notation looks portable, which in libpcap means an alias for host NAME, except it resolves the name from a different source.

Regarding which implementation is correct and which is not, let's say /etc/networks contains the following:

subnet4 127.1.2.0
subnet3 127.1.2
subnet2 127.1
subnet1 127

The Linux man page says:

The trailing ".0" (for the host component of the network address) may be omitted.

Using this definition, the first two networks should be the same in IPv4 sense and the last two networks should be equal in IPv4 sense to the following:

subnet42 127.1.0.0
subnet41 127.0.0.0

On Linux with GNU libc (Debian 12) this seems to hold:

# subnet4 127.1.2.0
$ ./testprogs/filtertest IPv4 'src net subnet4'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010200      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# subnet3 127.1.2
$ ./testprogs/filtertest IPv4 'src net subnet3'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010200      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# subnet2 127.1
$ ./testprogs/filtertest IPv4 'src net subnet2'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# subnet1 127
$ ./testprogs/filtertest IPv4 'src net subnet1'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f000000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# subnet42 127.1.0.0
$ ./testprogs/filtertest IPv4 'src net subnet42'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# subnet41 127.0.0.0
$ ./testprogs/filtertest IPv4 'src net subnet41'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f000000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

This way, as far as the numbers look, GNU libc implementation is consistent with Linux man page, but there is no indication of the mask length, so in libpcap all of those are /32 matches.

On FreeBSD networks(5) says:

Network numbers may be specified in the conventional ``.'' (dot) notation using the inet_network(3) routine from the Internet address manipulation library, inet(3).

In turn, inet_network(3) discusses fewer-than-four-parts IPv4 string format, but only with regard to inet_aton() and inet_addr(), where the host number can be up to 24-bit wide. It does not say how inet_network() is supposed to parse such addresses. A quick test confirms that the results are different from Linux and in some cases look off otherwise (could be a libpcap bug though):

# Address looks correct, netmask looks correct.
# subnet4 127.1.2.0
$ ./testprogs/filtertest IPV4 'src net subnet4'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010200      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# Address looks correct, netmask looks correct.
# subnet3 127.1.2
$ ./testprogs/filtertest IPV4 'src net subnet3'
(000) ld       #0x0
(001) ld       [12]
(002) and      #0xffffff00
(003) jeq      #0x7f010200      jt 4	jf 5
(004) ret      #262144
(005) ret      #0

# Address looks incorrect, netmask looks incorrect.
# subnet2 127.1
$ ./testprogs/filtertest IPV4 'src net subnet2'
(000) ld       #0x0
(001) ld       [12]
(002) and      #0xffffff00
(003) jeq      #0x7f000200      jt 4	jf 5
(004) ret      #262144
(005) ret      #0

# Address looks incorrect, netmask looks incorrect.
# subnet1 127
$ ./testprogs/filtertest IPV4 'src net subnet1'
(000) ld       #0x0
(001) ld       [12]
(002) and      #0xffffff00
(003) jeq      #0x7f000100      jt 4	jf 5
(004) ret      #262144
(005) ret      #0

# Address looks correct, lack of netmask looks correct.
# subnet42 127.1.0.0
$ ./testprogs/filtertest IPV4 'src net subnet42'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f010000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

# Address looks correct, lack of netmask looks correct.
# subnet41 127.0.0.0
$ ./testprogs/filtertest IPV4 'src net subnet41'
(000) ld       #0x0
(001) ld       [12]
(002) jeq      #0x7f000000      jt 3	jf 4
(003) ret      #262144
(004) ret      #0

This way, since different OSes define, with various levels of detail, the syntax of /etc/networks differently, it is one dimension of this problem space; another dimension is whether each OS actually implements correctly for every edge case what it documents, even more so after any potential future bug fixes. This complication needs to be mentioned to discourage the users from using the net NAME syntax now and enabling it after it becomes off by default.

infrastation avatar Feb 19 '25 14:02 infrastation

This way, as far as the numbers look, GNU libc implementation is consistent with Linux man page, but there is no indication of the mask length, so in libpcap all of those are /32 matches.

The getnet* APIs just provide a "network number", not a netmask, so libpcap has to infer the netmask, which it does by using the class A/B/C rules. That fails on Linux, as it's getting a masked-out IP address rather than a "network number". It also fails if it's a network that's neither class A nor B nor C.

What we could do, I guess, is have a routine that takes a string and converts it to a netmask and an address with bits not in the netmask zeroed out; it would:

  • convert a dotted ''n''-tuple to an address and a netmask based on 1) appending .0s as needed to make it a dotted quad and 2) doing the class A/B/C stuff;
  • convert a dotted ''n''-tuple followed by "/''m'" to an address and a netmask based on 1) appending 0s as necessary to make it a dotted quad and 2) using ''m'' as the mask length;
  • convert anything else to an address and a netmask based on:
    • on Linux and OpnBSD, using getnetbyname(), assuming it's appended the .0s, and doing the class A/B/C stuff to get a netmask;
    • on most other systems, using getnetbyname(), assuming it returned a network number, and doing the "Promote short net number" stuff to get an address and a netmask;
    • on macOS, doing the same and hoping for the best (macOS may well do the right thing if the lookup isn't a lookup on /etc/networks);
    • on Haiku, not sure, I need to look at the code to see why it's being weird;
  • and, if the lookup fails, failing.

guyharris avatar Feb 19 '25 20:02 guyharris

The simplest way to avoid debugging Haiku bugs that have been fixed already would be making a new installation of R1/beta5 and upgrading it to a development snapshot.

That said, off-by-default seems a more sustainable long term approach.

infrastation avatar Feb 19 '25 20:02 infrastation

Also I thought that covering the existing implementation with tests would provide a good sense of difficulty of implementing the better portability/heuristics you suggested. It would also reduce the chances of trading one bug for another whilst making the changes. To that end, there is a few half-complete tests in my working copy for the relatively portable (although useless) dotted-quad syntax, that part should not take long to commit.

infrastation avatar Feb 20 '25 01:02 infrastation

I think that "sudo rm /etc/networks" is probably the right solution to this problem.

I don't think libpcap should attempt to work around this 1982 technology.

mcr avatar Feb 20 '25 01:02 mcr

So it's beginning to look as if we should strongly recommend against using network names for the "net" keyword, and maybe even deprecate support for that and eventually remove it.

+1

fxlb avatar Feb 20 '25 07:02 fxlb

I think that "sudo rm /etc/networks" is probably the right solution to this problem.

From man networks:

       This file is read by the route(8) and netstat(8) utilities.  Only Class
       A,  B,  or  C  networks are supported, partitioned networks (i.e., net-
       work/26 or network/28) are not supported by this file.

Remove it could have side effects.

I don't think libpcap should attempt to work around this 1982 technology.

+1

fxlb avatar Feb 20 '25 08:02 fxlb

The simplest way to avoid debugging Haiku bugs that have been fixed already would be making a new installation of R1/beta5 and upgrading it to a development snapshot.

It turns out that my Haiku VM is running beta 5, not beta 4. (uname doesn't report that in any obvious way, but the "About" menu reports it.)

guyharris avatar Feb 20 '25 08:02 guyharris