nebula icon indicating copy to clipboard operation
nebula copied to clipboard

Fix UDP listener on IPv4-only Linux

Open jilyaluk opened this issue 2 years ago • 17 comments

On some systems, IPv6 is disabled (for example, CIS benchmark recommends to disable it when not used), but currently all UDP connections are using AF_INET6 sockets. When we are binding AF_INET6 socket to an address like ::ffff:1.2.3.4 (IPv4 addresses are parsed by net.ParseIP this way), we can't send or receive IPv6 packets anyway, so this will not break any scenarios.

Fixes #467

jilyaluk avatar Aug 03 '21 10:08 jilyaluk

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 03 '21 10:08 CLAassistant

There is no default reviewers policy and I can't assign reviewers, so tagging @nbrownus @wadey

jilyaluk avatar Aug 03 '21 11:08 jilyaluk

Hi @jilyaluk, it doesn't work at my environment. When the lighthouse doesn't have ipv6 address and the other node has an ipv6 address, the lighthouse show the error:

level=error msg="Failed to send handshake message" cached=true error="listener is IPv4, but writing to IPv6 remote"

sndyuk avatar Sep 19 '21 13:09 sndyuk

When the lighthouse doesn't have ipv6 address and the other node has an ipv6 address, the lighthouse show the error:

We need to update the PR to ignore sending to ipv6 remotes if unsupported on the host.

wadey avatar Oct 12 '21 17:10 wadey

@jilyaluk there is a git conflict with the current master, could you please update?

squatica avatar Dec 12 '21 07:12 squatica

Any progress here? We want to switch to Nebula but need this issue fixed first. Unfortunately, I couldn't fix the remaining v6 issue myself in @jilyaluk 's version and by now it's heavily conflicting with the master... :(

danielb42 avatar Jan 04 '22 20:01 danielb42

Hey guys! Sorry about the delay, I've been pretty busy with other stuff lately.

I rebased my branch onto master, and fixed the issue with sending packets from IPv4-only nodes to IPv6 addresses (replaced error returning with warning in logs).

jilyaluk avatar Jan 08 '22 09:01 jilyaluk

I'm afraid your patch still isn't working as a v6-less lighthouse now doesn't throw an error anymore but still doesn't answer handshakes.

WARN[0000] socket is IPv4-only, not sending to IPv6 address: :: 
INFO[0000] Handshake message sent                        certName=laptop [...] handshake="map[stage:2 style:ix_psk0]" [...] udpAddr="[::]:8110" vpnIp=10.0.0.3

INFO[0001] Handshake message received                    certName=laptop [...] handshake="map[stage:1 style:ix_psk0]" [...] udpAddr="[::]:8110" vpnIp=10.0.0.3
                                                                                                                           
WARN[0001] socket is IPv4-only, not sending to IPv6 address: :: 
INFO[0001] Handshake message sent                        cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="[::]:8110" vpnIp=10.0.0.3

Handshake message sent there seems to be wishful thinking, there are no replies going out. I guess udpAddr="[::]:8110" means the code still assumes a v6 peer or socket. So just bailing out here (return whatever) doesn't help as the peers won't receive replies from the lighthouse.

danielb42 avatar Jan 09 '22 01:01 danielb42

I'm afraid your patch still isn't working as a v6-less lighthouse now doesn't throw an error anymore but still doesn't answer handshakes.

What's the value of listen.host in configs? Setting it to 0.0.0.0 on both ends should work fine, AFAIR.

jilyaluk avatar Jan 09 '22 12:01 jilyaluk

Unfortunately nope, had listen.host 0.0.0.0 first, then static IPs, then combinations on lighthouse/clients ... doesn't change anything. Is there anything else I can try or provide to help?

danielb42 avatar Jan 09 '22 12:01 danielb42

In case you suspect another config value to be the issue, here's the whole diff between https://github.com/slackhq/nebula/blob/master/examples/config.yml and my testing lighthouse:

$ sdiff -s example_config.yaml config-srv1.yaml 
  cert: /etc/nebula/host.crt                                  |   cert: /etc/nebula/srv1.crt
  key: /etc/nebula/host.key                                   |   key: /etc/nebula/srv1.key
  "192.168.100.1": ["100.64.22.11:4242"]                      <
  am_lighthouse: false                                        |   am_lighthouse: true
    - "192.168.100.1"                                         <

(I also changed inbound firewall proto from icmp to any for some time, but unsurprisingly no luck with that.)

// The system is ubuntu 20.04, IPv6 completely disabled through /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 (....)"
GRUB_CMDLINE_LINUX="ipv6.disable=1"

danielb42 avatar Jan 09 '22 12:01 danielb42

Huh, interesting. I'll try to reproduce the issue on my setup.

jilyaluk avatar Jan 09 '22 13:01 jilyaluk

I spun up two cloud nodes with vanilla ubuntu 20.04 and more bad news - I didn't even disable IPv6 on these nodes and see the exact same problem.

Here's the gist of the startups of your branches' binary vs. the nebula release 1.5.2:

nebula 1.5.2:

...
INFO[0000] Nebula interface is active   build=1.5.2 interface=nebula1 network=10.0.0.1/24 udpAddr="0.0.0.0:4242"
...
INFO[0022] Handshake message received   certName=laptop ... handshake="map[stage:1 style:ix_psk0]" udpAddr="65.108.xx.xx:4242" vpnIp=10.0.0.2
INFO[0022] Handshake message sent       certName=laptop ... handshake="map[stage:2 style:ix_psk0]" udpAddr="65.108.xx.xx:4242" vpnIp=10.0.0.2

your patch (g7bf1698)

...
INFO[0000] Nebula interface is active   build=1.4.0--53-g7bf1698-dirty interface=nebula1 network=10.0.0.1/24 udpAddr="0.0.0.0:4242"
...
INFO[0039] Handshake message received   certName=laptop ... handshake="map[stage:1 style:ix_psk0]" udpAddr="[::]:4242" vpnIp=10.0.0.2
WARN[0039] socket is IPv4-only, not sending to IPv6 address: :: 
INFO[0039] Handshake message sent       certName=laptop ... handshake="map[stage:2 style:ix_psk0]" udpAddr="[::]:4242" vpnIp=10.0.0.2

As noted before, the only difference is that your branch states udpAddr="[::]:4242" while the official release (which I can check against here as IPv6 wasn't disabled) has udpAddr="65.108.xx.xx:4242", which is the public IP of the connecting client.

Hope this helps..

danielb42 avatar Jan 09 '22 23:01 danielb42

@jilyaluk I found the problem.

https://github.com/jilyaluk/nebula/blob/7bf1698b8a467eaedf3cd33cd2099938de79e5f1/udp/udp_linux.go#L163

For IPv4 clients, the client address must be obtained from names[i][4:8]. In the indexes above that are mostly zeros, which are then of course interpreted as 0.0.0.0, or :: in this case.

Your branch is now working for me (hooray!!) but I'm pretty sure that I broke IPv6 now by editing those indices to the v4-appropriate numbers. If you could wrap that section into something that is aware of v4/v6 and supports both correctly, I think your PR would be good to go upstream.

// Sent you a PR with my somewhat naive idea of solving the issue, i.e. read v4-indices when v6-indices are all zero.

danielb42 avatar Jan 10 '22 20:01 danielb42

Hey @danielb42, thanks for your investigation!

AFAIU, PrepareRawMessages always return zeroed names (it only creates new arrays, not fills them), so your check will always fail to IPv4. However, we could use simpler u.isV4 for checking whether current socket is IPv4. Pushed a fix, could you please check whether it works on your setup?

jilyaluk avatar Feb 01 '22 09:02 jilyaluk

Hi @jilyaluk, looks all good to me, works out of the box now.

But I'm still seeing log spam when ipv6 is (not disabled && a v6-interface is up) WARN[0080] socket is IPv4-only, not sending to IPv6 address: 2a01:xxx...::1

When ipv6 is disabled via grub/kernel, or if it's enabled but without an active interface, then that line does not appear.

danielb42 avatar Feb 04 '22 00:02 danielb42

I am probably doing something wrong but I can't get this to work on an raspberry pi which is an ipv4-only host (not a lighthouse). The 1.5.2 nebula release does not work there, with the same error as reported in https://github.com/slackhq/nebula/issues/467 (unable to open socket: address family not supported by protocol).

I tried:

# First downloaded Go 1.18 from https://go.dev/
git clone https://github.com/jilyaluk/nebula.git
cd nebula && git checkout 7759c5dcd2e9badbc02de0c345204a0f374962bb
make all #can't work out how to only make rpi target

This built nebula starts up fine and seems to show handshakes sent/received OK:

INFO[0000] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=4159822821 udpAddrs="[...:4242]" vpnIp=10.10.10.1
INFO[0000] Handshake message received                    certName=lighthouse durationNs=291239297 fingerprint=... handshake="map[stage:2 style:ix_psk0]" initiatorIndex=4159822821 issuer=... remoteIndex=4159822821 responderIndex=2178296672 sentCachedPackets=1 udpAddr="...:4242" vpnIp=10.10.10.1

Likewise the lighthouse shows handshakes sent/received from it:

Mar 18 00:43:39 my-lighthouse-name nebula[505]: time="2022-03-18T00:43:39Z" level=info msg="Handshake message received" certName=rpi4 fingerprint=... handshake="map[stage:1 style:ix_psk0]" initiatorIndex=4159822821 issuer=... remoteIndex=0 responderIndex=0 udpAddr="...:4242" vpnIp=10.10.10.11
Mar 18 00:43:39 my-lighthouse-name nebula[505]: time="2022-03-18T00:43:39Z" level=info msg="Handshake message sent" certName=rpi4 fingerprint=... handshake="map[stage:2 style:ix_psk0]" initiatorIndex=4159822821 issuer=... remoteIndex=0 responderIndex=2178296672 sentCachedPackets=0 udpAddr="...:4242" vpnIp=10.10.10.11

But I can't access the host, not even ping:

$ ping 10.10.10.11
PING 10.10.10.11 (10.10.10.11): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2

I have the default firewall for ping on that host:

inbound:
  # Allow icmp between any nebula hosts
  - port: any
    proto: icmp
    host: any

The host I am pinging from shows some timed out msgs when I try that ping (along with a lot of other handshake sent msgs):

INFO[2971] Handshake timed out                           durationNs=8243892463 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=111188796 remoteIndex=0 udpAddrs="[...:4242]" vpnIp=10.10.10.1
INFO[2971] Handshake timed out                           durationNs=8443722364 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1581717362 remoteIndex=0 udpAddrs="[]" vpnIp=10.10.10.11

I can access another rpi host where ipv6 is not disabled, which is running the current nebula 1.5.2 release, fine:

$ ping 10.10.10.10
PING 10.10.10.10 (10.10.10.10): 56 data bytes
64 bytes from 10.10.10.10: icmp_seq=0 ttl=64 time=3.427 ms
64 bytes from 10.10.10.10: icmp_seq=1 ttl=64 time=3.031 ms

Am I using the right branch/checkout? Or doing something else wrong? Any idea when this patch might be integrated in a release?

Thanks for this patch, and for nebula, it is awesome.

dont-panic-42 avatar Mar 18 '22 01:03 dont-panic-42

added a PR with latest master

https://github.com/slackhq/nebula/pull/787

perfecto25 avatar Nov 30 '22 21:11 perfecto25