nebula icon indicating copy to clipboard operation
nebula copied to clipboard

Question: NAT Setup

Open jatsrt opened this issue 5 years ago • 85 comments

I seem to be missing something important. If I setup a mesh of hosts with all direct public IP addresses, it works fine. However, if I have a network with a light house(public IP), then all nodes behind NAT, they will not connect to each other. The lighthouse is able to communicate with all hosts, but hosts are not able to communicate with each other.

Watching the logs I see connections trying to be made to both the NAT public, and the private IPs.

I have enabled punchy and punch back, but does not seem to help.

Hope it is something simple?

jatsrt avatar Nov 23 '19 14:11 jatsrt

Also, to note in this setup all nodes are behind different NATs on different networks. Hub and spoke with the hub being the lighthouse and spokes going to hosts on different networks.

jatsrt avatar Nov 23 '19 14:11 jatsrt

My best guess (because I just messed this up in a live demo), is that am_lighthouse may be set to "true" on the individual nodes.

Either way, can you post your lighthouse config and one of your node configs?

(feel free to replace any sensitive IP/config bits, just put consistent placeholders in their place)

rawdigits avatar Nov 23 '19 14:11 rawdigits

Hi, I have the same issue. My lighthouse is on a DigitalOcean droplet with public IP. My MacBook and Linux Laptop at home are on the same network both connected to lighthouse. I can ping lighthouse from both laptop, but I cannot ping from one laptop to the other.

Lighthouse config

pki:
  ca: /data/cert/nebula/ca.crt
  cert: /data/cert/nebula/lighthouse.crt
  key: /data/cert/nebula/lighthouse.key
static_host_map:
  "192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
  am_lighthouse: true
  interval: 60
  hosts:
listen:
  host: 0.0.0.0
  port: 4242
punchy: true
tun:
  dev: neb0
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
logging:
  level: info
  format: text
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: icmp
      host: any
    - port: 443
      proto: tcp
      groups:
        - laptop

Macbook config

pki:
  ca: /Volumes/code/cert/nebula/ca.crt
  cert: /Volumes/code/cert/nebula/mba.crt
  key: /Volumes/code/cert/nebula/mba.key
static_host_map:
  "192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "LIGHTHOUSE_PUBLIC_IP"
punchy: true
tun:
  dev: neb0
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
logging:
  level: debug
  format: text
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: icmp
      host: any
    - port: 443
      proto: tcp
      groups:
        - laptop

Linux laptop config

pki:
  ca: /data/cert/nebula/ca.crt
  cert: /data/cert/nebula/server.crt
  key: /data/cert/nebula/server.key
static_host_map:
  "192.168.100.1": ["LIGHTHOUSE_PUBLIC_IP:4242"]
lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "LIGHTHOUSE_PUBLIC_IP"
punchy: true
listen:
  host: 0.0.0.0
  port: 4242
tun:
  dev: neb0
  drop_local_broadcast: false
  drop_multicast: false
  tx_queue: 500
  mtu: 1300
logging:
  level: info
  format: text
firewall:
  conntrack:
    tcp_timeout: 120h
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000
  outbound:
    - port: any
      proto: any
      host: any
  inbound:
    - port: any
      proto: icmp
      host: any
    - port: 443
      proto: tcp
      groups:
        - laptop

nfam avatar Nov 23 '19 17:11 nfam

@nfam thanks for sharing the config. My next best guess is that nat isn't reflecting and for some reason nodes also aren't finding each other locally.

Try setting the local_range config setting on the two laptops, which can give them a hint about the local network range to use for establishing the direct tunnel.

rawdigits avatar Nov 23 '19 18:11 rawdigits

@nfam similar setup. Public lighthouse on digital ocean, laptop on home nat, and server in AWS behind a NAT. Local and AWS are using different private ranges(though overlap should be handled)

jatsrt avatar Nov 23 '19 19:11 jatsrt

@rawdigits setting local_range does not help. I stopped nebula on both laptops, set log on lighthouse to debug, cleared log and restarted lighthouse (no node connected to). Following is the log I got.

time="2019-11-23T20:05:18Z" level=info msg="Main HostMap created" network=192.168.100.1/24 preferredRanges="[]" time="2019-11-23T20:05:18Z" level=info msg="UDP hole punching enabled" time="2019-11-23T20:05:18Z" level=info msg="Nebula interface is active" build=1.0.0 interface=neb0 network=192.168.100.1/24 time="2019-11-23T20:05:18Z" level=debug msg="Error while validating outbound packet: packet is not ipv4, type: 6" packet="[96 0 0 0 0 8 58 255 254 128 0 0 0 0 0 0 183 226 137 252 10 196 21 15 255 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 133 0 27 133 0 0 0 0]"

nfam avatar Nov 23 '19 20:11 nfam

My Config: nebula-cert sign -name "lighthouse" -ip "192.168.100.1/24" nebula-cert sign -name "laptop" -ip "192.168.100.101/24" -groups "laptop" nebula-cert sign -name "server" -ip "192.168.100.201/24" -groups "server"

Lighthouse:

pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/lighthouse.crt
  key: /etc/nebula/lighthouse.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: true
  interval: 60

listen:
  host: 0.0.0.0
  port: 4242

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

Laptop:

pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/laptop.crt
  key: /etc/nebula/laptop.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "192.168.100.1"

listen:
  host: 0.0.0.0
  port: 0

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

Server:

pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/server.crt
  key: /etc/nebula/server.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "192.168.100.1"

listen:
  host: 0.0.0.0
  port: 0

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

With this setup, both server and laptop can ping lighthouse, lighhouse can ping server and laptop, but laptop cannot ping server and server cannot ping laptop.

I get messages such as this as it's trying to make the connection:

INFO[0006] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0007] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0009] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0011] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0012] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0014] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0016] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201

jatsrt avatar Nov 23 '19 20:11 jatsrt

@nfam similar error, not sure it's the problem

Error while validating outbound packet: packet is not ipv4, type: 6 packet="[96 0 0 0 0 8 58 255 254 128 0 0 0 0 0 0 139 176 20 9 146 65 14 250 255 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 133 0 60 66 0 0 0 0]" DEBU[0066] Error while validating outbound packet: packet is not ipv4, type: 6 packet="[96 0 0 0 0 8 58 255 254 128 0 0 0 0 0 0 139 176 20 9 146 65 14 250 255 2 0 0 0 0 0 0 0 0 0 0 0 0 0 2 133 0 60 66 0 0 0 0]"

jatsrt avatar Nov 23 '19 20:11 jatsrt

@jatsrt

The Error while validating outbound packet can mostly be ignored. Just some types of packet nebula doesn't support bouncing off.

As far as the handshakes, for some reason hole punching isn't working. A few things to try:

  1. Add punch_back: true on the "server" and "laptop" nodes.
  2. explicitly allow all UDP in to the "server" node from the internet (via AWS security groups, just as a test)
  3. verify iptables isn't blocking anything.

Also It appears the logs with the handshake messages are from the laptop? If so can you also share nebula logs from the server as it tries to reach the laptop?

Thanks!

rawdigits avatar Nov 23 '19 21:11 rawdigits

Aha, @nfam I think I spotted the config problem.

instead of

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "LIGHTHOUSE_PUBLIC_IP"

it should be

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "192.168.100.1"

rawdigits avatar Nov 23 '19 21:11 rawdigits

adding #40 to cover accidental misconfiguration noted above.

rawdigits avatar Nov 23 '19 21:11 rawdigits

@rawdigits yes, it is. Now both laptops can ping to each other. Thanks!

nfam avatar Nov 23 '19 21:11 nfam

@rawdigits

  1. added punch back on "server" and "laptop"
  2. security group for that node is currently wide open for all protocols
  3. No iptables on any of these nodes, base ubuntu server for testing

Server log:

time="2019-11-24T00:25:21Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:22Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:22Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:23Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:24Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="192.168.0.22:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:25Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:26Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="192.168.0.22:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:27Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:28Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="192.168.0.22:51176" vpnIp=192.168.100.101
time="2019-11-24T00:25:30Z" level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1689969496 remoteIndex=0 udpAddr="96.252.12.10:51176" vpnIp=192.168.100.101

jatsrt avatar Nov 24 '19 00:11 jatsrt

So, tried a few more setups, just comes down to what seems like if the two hosts that are trying to communicate with each other are both on different networks and both behind NAT, it will not work.
If the lighthouse does not facilitate the communication/tunneling, this would make sense, but is it meant to be a limitation?

jatsrt avatar Nov 24 '19 01:11 jatsrt

Dual NAT scenario is a bit tricky, possibly room for improvement from nebula's perspective there. Do you have details on the type of NATs you are dealing with?

nbrownus avatar Nov 24 '19 01:11 nbrownus

@nbrownus nothing crazy, I've done multiple AWS VPC NAT gateways with hosts behind them and they cannot connect. I've also tried "home" NAT(google WiFi router based NAT), with no success.

From a networking perspective, I get why it's "tricky" was hoping there was some trick nebula was doing.

jatsrt avatar Nov 24 '19 01:11 jatsrt

@rawdigits can speak to the punching better than I can. If you are having problems in AWS then we can get a test running and sort out the issues.

nbrownus avatar Nov 24 '19 01:11 nbrownus

Yeah, so all my tests have had at least one host behind an AWS NAT Gateway

jatsrt avatar Nov 24 '19 01:11 jatsrt

Longshot, but one more thing to try until I set up an AWS NAT GW: set the UDP port on all nodes to 4242 and let NAT remap it. One ISP I've dealt with blocks the random ephemeral udp ports above 32,000, presumably because they think every high UDP port is bittorrent.

Probably won't work, but easy to test..

rawdigits avatar Nov 24 '19 04:11 rawdigits

@rawdigits same issue

Network combination: Lighthouse - Digital Ocean NYC3 - Public IP Server - AWS - Oregon - Private VPC with AWS NAT Gateway (172.31.0.0/16) Laptop - Verizon FIOS With Google WIFI Router NAT (192.168.1.0/24) Server2(added later to test) - AWS - Ohio Private VPC with AWS NAT Gateway (10.200.200.0/24)

I added in a second server in a different VPC on AWS to remove the FIOS variable, and had the same results, with server and server2 trying to communicate

INFO[0065] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="172.31.106.61:4242" vpnIp=192.168.100.201
INFO[0066] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="18.232.11.42:42005" vpnIp=192.168.100.201
INFO[0067] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="172.31.106.61:4242" vpnIp=192.168.100.201
INFO[0069] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="18.232.11.42:42005" vpnIp=192.168.100.201
INFO[0071] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="172.31.106.61:4242" vpnIp=192.168.100.201
INFO[0072] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=760525141 remoteIndex=0 udpAddr="18.232.11.42:42005" vpnIp=192.168.100.201

jatsrt avatar Nov 24 '19 14:11 jatsrt

@jatsrt I'll stand up a testbed this week to explore what may be the cause of the issue. Thanks!

rawdigits avatar Nov 25 '19 10:11 rawdigits

My Config: nebula-cert sign -name "lighthouse" -ip "192.168.100.1/24" nebula-cert sign -name "laptop" -ip "192.168.100.101/24" -groups "laptop" nebula-cert sign -name "server" -ip "192.168.100.201/24" -groups "server"

Lighthouse:

pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/lighthouse.crt
  key: /etc/nebula/lighthouse.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: true
  interval: 60

listen:
  host: 0.0.0.0
  port: 4242

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

Laptop:

pki:
  # The CAs that are accepted by this node. Must contain one or more certificates created by 'nebula-cert ca'
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/laptop.crt
  key: /etc/nebula/laptop.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "192.168.100.1"

listen:
  host: 0.0.0.0
  port: 0

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

Server:

pki:
  ca: /etc/nebula/ca.crt
  cert: /etc/nebula/server.crt
  key: /etc/nebula/server.key

static_host_map:
  "192.168.100.1": ["167.71.175.250:4242"]

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
    - "192.168.100.1"

listen:
  host: 0.0.0.0
  port: 0

punchy: true

tun:
  dev: nebula1
  mtu: 1300

logging:
  level: info
  format: text

firewall:
  conntrack:
    tcp_timeout: 12m
    udp_timeout: 3m
    default_timeout: 10m
    max_connections: 100000

  outbound:
    - port: any
      proto: any
      host: any

  inbound:
    - port: any
      proto: icmp
      host: any

With this setup, both server and laptop can ping lighthouse, lighhouse can ping server and laptop, but laptop cannot ping server and server cannot ping laptop.

I get messages such as this as it's trying to make the connection:

INFO[0006] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0007] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0009] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0011] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0012] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201
INFO[0014] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="172.31.106.61:37058" vpnIp=192.168.100.201
INFO[0016] Handshake message sent                        handshake="map[stage:1 style:ix_psk0]" initiatorIndex=3339283633 remoteIndex=0 udpAddr="18.232.11.42:4726" vpnIp=192.168.100.201

I have got the same situation. node_A <----> lighthouse OK node_B <----> lighthouse OK node_A < ----> node_B Not work, cannot ping each other.

But I found, node_A and node_B can communicate with each other ONLY if both are connected to the same router, such as the same WiFi router.

PS punch_back: true on both node_A and node_B.

No firewall on node_A, node_B and lighthouse.

iamid0 avatar Nov 27 '19 03:11 iamid0

hole punch very difficult and random

fireapp avatar Nov 27 '19 06:11 fireapp

I also can't get nebula to work properly when both nodes are behind a typical NAT (Technically PAT) regardless of any port pinning I do in the config. They happily connect to the lighthouse I have in AWS but it seems like something isn't working properly. I've got punchy and punchback enabled on everything and it doesn't seem to help. I've tried setting the port on the nodes to 0, and also trying the same port that lighthouse is listening on.

The nodes have no issues connecting to each other over the MPLS, but we don't want that (performance reasons)

Edit: To add a bit more detail, even Meraki's AutoVPN can't deal with this. In their situation the "hub" needs to be told it's public IP and a fixed port that is open inbound. I'd be fine with that as an option, and may be the only reliable one if both nodes are behind different NATs.

Another option I had considered, what if we could use the lighthouses to hairpin traffic? I'd much rather pay AWS for the bandwidth than have to deal with unfriendly NATs everywhere.

spencerryan avatar Nov 27 '19 19:11 spencerryan

I did a bit more research, and it appears that the "AWS Nat Gateway" uses Symmetric NAT, which isn't friendly to hole punching of any kind. NAT gateways also don't appear to support any type of port forwarding, so fixing this by statically assigning and forwarding a port doesn't appear to be an option.

A NAT instance would probably work, but I realize that's probably not a great option. One thing I recommend considering would be to give instances a routable IP address, but disallow all inbound traffic. This wouldn't greatly change the security of your network, since you still aren't allowing any unsolicited packets to reach the hosts, but would allow hole punching to work properly.

rawdigits avatar Nov 27 '19 20:11 rawdigits

I don't think NAT so much is the issue but PAT (port translation). Unfortunately with that you can't predict what your public port will be and hole punching becomes impossible if both ends are behind a similar PAT. I'm going to do some testing, but I think that as long as 1 of 2 nodes has a 1:1 NAT (no port translation) a public IP on the node directly isn't a concern.

If I get particularly ambitious I may attempt to whip up some code in lighthouse to detect when one/both nodes are behind a PAT and throw a warning saying that this won't work out of the box.

spencerryan avatar Nov 28 '19 17:11 spencerryan

If I get particularly ambitious I may attempt to whip up some code in lighthouse to detect when one/both nodes are behind a PAT and throw a warning saying that this won't work out of the box

I've thought about this before. You need at least 2 lighthouses, and I think it's best to implement as a flag on the non-lighthouses (when you query the lighthouses for a host, if you get results with the same IP but different ports then you know the remote is problematic).

wadey avatar Nov 28 '19 19:11 wadey

I haven't dug into the handshake code but if you include the source port in the handshake the lighthouse can compare that to what it sees. If they differ you know something in the middle is doing port translation.

spencerryan avatar Nov 28 '19 20:11 spencerryan

Aha, @nfam I think I spotted the config problem.

instead of

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "LIGHTHOUSE_PUBLIC_IP"

it should be

lighthouse:
  am_lighthouse: false
  interval: 60
  hosts:
  - "192.168.100.1"

I bet this is also my issue... will test it soon. That section is confusing 😕

jocull avatar Dec 08 '19 04:12 jocull

That was not a fix - I had it configured like this already. After more testing I think what I have is hole punching issue with my NAT.

  • The lighthouse is a DigitalOcean droplet with a public IP and open port 4242 via UFW. This seems fine.
  • My laptop is behind a regular consumer Netgear router with whatever NAT that has.
  • Even with punchy and punch back enabled I can't connect. I can see both the laptop and the lighthouse trying to handshake with each other endlessly. It seems like they are trying to punch back to each other and failing.
  • If I open a firewall port 4242 to my laptop's internal IP things start to work fine. But this kind of defeats the purpose of trying to use this in the first place.

jocull avatar Dec 08 '19 20:12 jocull