ip2unix icon indicating copy to clipboard operation
ip2unix copied to clipboard

Improve hash function for SockAddr

Open aszlig opened this issue 3 years ago • 0 comments

Right now the hash function uses strings to determine hash value. However, this is quite expensive and we can also do better in terms of avoiding collisions.

While the port and address family are the most important parts where we want to avoid collisions, the IP addresses also need to be hashed properly so that ideally we won't get any collision at all.

So here's how IP addresses are assigned starting from the most common cause to the least common:

  • ~~Remote peer determined via SO_PEERCRED:~~

    • ~~IPv4: Equal to PID (uint32_t)~~
    • ~~IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)~~
  • ~~Implicit binding via connect or sendto/sendmsg:~~

    • ~~IPv4: Equal to PID (uint32_t)~~
    • ~~IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)~~
  • If address is already a loopback address passed to eg. bind, keep it as is.

  • If it's an implicit binding done via recvfrom or recvmsg, we get local bindings with the following IP addresses:

    • IPv4: 0x00XXXXXX (X -> Random value)
    • IPv6: 0xfe8000000000000000000000XXXXXXXX (X -> Random value)

Since size_t is guaranteed to have at least 16 bits, it's probably safe to fall back to only encode the port if size_t is 16 bit, since we usually have one low port and several ephemeral ports. Also, we're only interested in Unix domain sockets, so distinguishing between IPv4 and IPv6 could even be neglected.

If size_t is 32 bits or more, it's certainly a good idea to encode the IP address as well. Since the PID is always the last part of the address for the two most common cases, it's probably safe to only encode the PID. Similarly even if we already got a loopback address or a random IP address, the last 32 bits are the interesting ones.

However, we still need the port and address family, so we certainly want to trim the IP address portion down to 16 bit if we have exactly 32 bits. Of course, on 64 bit systems, we can even use 48 bits, so collisions are even more unlikely to occur. What I'm not sure about here is which value to use for upper 16 bits of those 48 bits. 0xfe80? Or just a bit flag?

Update: I've removed the first two cases above, because they're not actually used in an unordered_map.

aszlig avatar Jul 20 '20 01:07 aszlig