netbird icon indicating copy to clipboard operation
netbird copied to clipboard

New Relay public thread - Q&A and Issues discussions

Open mlsmaycon opened this issue 1 year ago • 64 comments

Hello folks, this issue is open to any questions or problems regarding the new relay implementation.

mlsmaycon avatar Sep 09 '24 17:09 mlsmaycon

Status information to confirm relay usage:

Peers detail:
 relay-test-ip-172-20-1-178-rly.netbird.selfhosted:
  NetBird IP: 100.89.101.6
  Public key: CdRpcUnzq2LM9v97VnU7JiiqE0Y4wXp379mXju0efjk=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address: rels://relay-eu1.stage.netbird.io <--------------- indicates the relay used to connect to the remote peer
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: -
  Latency: 0s

 relay-test-ip-172-20-14-148.netbird.selfhosted:
  NetBird IP: 100.89.212.227
  Public key: bhSrOMLvN+5cMnjWyL4gB+o9En2a1AvAGWNB5N+gEGw=
  Status: Connected
  -- detail --
  Connection type: P2P
  ICE candidate (Local/Remote): host/srflx
  ICE candidate endpoints (Local/Remote): 192.168.178.38:51820/1.2.3.4:51820
  Relay server address: rels://relay-eu2.stage.netbird.io. <--------------- indicates the relay used to connect to the remote peer ( there is a bug which this needs to be cleaned after P2P connection)
  Last connection update: 2 seconds ago
  Last WireGuard handshake: 3 seconds ago
  Transfer status (received/sent) 92 B/180 B
  Quantum resistance: false
  Routes: 34.160.111.145/32
  Latency: 28.5755ms

OS: darwin/arm64
Daemon version: 0.29.0
CLI version: 0.29.0
Management: Connected to [https://test.stage.netbird.io:443](https://test.stage.netbird.io/)
Signal: Connected to [https://signal.stage.netbird.io:443](https://signal.stage.netbird.io/)
Relays:
  [stun:test.stage.netbird.io:3478] is Available
  [turn:test.stage.netbird.io:3478?transport=udp] is Available
  [rels://relay-eu1.stage.netbird.io] is Available.    <--------------- indicates the relay used by your local client (the home relay)
Nameservers:
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
FQDN: maycons-macbook-pro-2-1.netbird.selfhosted
NetBird IP: 100.89.107.107/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 2/2 Connected

mlsmaycon avatar Sep 09 '24 19:09 mlsmaycon

Hi,

I have some questions about the new relay which are not clear to me.

  1. In the release notes you wrote "We are moving away from the TURN relay (coturn) to our own relay implementation based on WebSocket". If I take that literally this means that "only" the TURN part of coturn gets replaced but not the STUN part. Is this correct and the release only the first step to replace coturn completely or is the STUN part also already replaced with the new relay?
  2. In the example mentioned above which indicates the relay is used it is I guess active in a secured version but in the release notes only this part is mentioned: "Addresses": ["rel://<DOMAIN>:<PORT>"] Can you enable TLS in the new relay and if yes how? Or is this something for a future release?
  3. I am using Traefik as a reverse proxy and also have implemented Netbird like described in the documentation which works well. I am missing documentation around the new relay and a reverse proxy.

Thanks in advance and also many thanks for your awesome work in building this great software stack!

allroundtechie avatar Sep 09 '24 20:09 allroundtechie

@landmass-deftly-reptile-budget:

  1. Stun is still going to be required for the P2P discovery. Also, for retro-compatibility, TURN is still required.
  2. The supported URLs are rel:// and rels://, where rels is used for TLS connections. Like signal and management, the relay have Let's Encrypt support, and you can use the environment variables below to enable it:
NB_EXPOSED_ADDRESS=rels://relay.example.com:443  # update the port configuration to match it
NB_LETSENCRYPT_DOMAINS=relay.example.com # should match the exposed address
NB_LETSENCRYPT_DATA_DIR=/etc/letsencrypt # mount this directory for persistency
[email protected]
#NB_LETSENCRYPT_AWS_ROUTE53=true # in case you want to use route 53 for issuing the certificate

It also supports certificate files with:

NB_TLS_CERT_FILE=/etc/certificates/cert.crt
NB_TLS_KEy_FILE=/etc/certificates/cert.key

Once this is done, add the exposed address to the management.json file and restart the file.

  1. Relay should work fine behind traefik. We are missing the configuration, but the traffic to the service can be routed with either a domain or with the /relay path prefix.

mlsmaycon avatar Sep 09 '24 21:09 mlsmaycon

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?

ismail0234 avatar Sep 09 '24 21:09 ismail0234

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

bryanjuho avatar Sep 10 '24 01:09 bryanjuho

For new relay to work is there any new openwrt package released?

rudradevpal avatar Sep 10 '24 04:09 rudradevpal

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

Marcus1Pierce avatar Sep 10 '24 07:09 Marcus1Pierce

  1. Also, for retro-compatibility, TURN is still required.

If I don't care about old clients, I can ignore TURN completely, right?

Otherwise, this sounds very promising, especially with Kubernetes, the port ranges of TURN have always made the setup a bit more complex. I will definitely give it a try and report back.

STUN will continue to be used in the future?

Zaunei avatar Sep 10 '24 11:09 Zaunei

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

WolfgangDpunkt avatar Sep 10 '24 11:09 WolfgangDpunkt

Replace PORT and DOMAIN according to your deployment.

I have used the automatic setup script, so I am probably using the default values for ports, so what do I need to specify here for PORT in the compose file?

It can be found in the setup.env file. The default port is 33080

ndziuba avatar Sep 10 '24 11:09 ndziuba

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

MDMeridio001 avatar Sep 10 '24 12:09 MDMeridio001

Hello,

Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;

mvivaldi avatar Sep 10 '24 13:09 mvivaldi

Hello, I have 2 questions. I am undecided whether to upgrade or not.

  1. I don't fully understand the new Relay Feature. How will it benefit us?
  2. What was the reason to switch to our own relay application? Was there something that the existing system did not meet?

@ismail0234 some of the benefits of the new relay over Coturn:

  • More efficient relay connection for multiple peers: The ICE mode with TURN opens a connection with the TURN server for each peer connection. That consumes more resources on the client and on the coturn server.
  • The NetBird connection with the new relay is up to 15% faster than coturn.
  • The service is easier to run on self-hosted environments, since need to configure a single port.
  • Built-in TLS/SSL support

The main idea is to have a more efficient relay system for NetBird. Turn/Coturn is a really good system for short-term connections. As a connection via VPN usually lasts many hours or days, we need a more efficient system that can easily be scaled.

mlsmaycon avatar Sep 10 '24 14:09 mlsmaycon

Is it okay to update to 0.29.0 without actually running the new relay image and changing management.json?

Yes it is. You don't need to update or configure anything if you don't want. It should be fully compatible with older versions of the management.json file.

mlsmaycon avatar Sep 10 '24 14:09 mlsmaycon

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

mlsmaycon avatar Sep 10 '24 14:09 mlsmaycon

Is it okay to use same domain for management, signal, coturn and relay?

Example: If i use domain netbird.domain.com and i want to use this domain for all services but with different port is that okay?

Yes it is possible.

mlsmaycon avatar Sep 10 '24 14:09 mlsmaycon

Hello, Is it possible to run the relay behind nginx acting as a proxy? I have tried by adding the following to my nginx configuration file, but it results in clients recieving a 400 error when trying to establish a connection to the relay. A direct connection without nginx in front works perfectly fine.

upstream relay-upstream {
    server 127.0.0.1:33080;
}

[...]

# Proxy Relay http endpoint
    location /relay/ {
        proxy_pass http://relay-upstream/relay;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }

try add these:

      proxy_set_header Host            $http_host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header X-Forwarded-For $remote_addr;
      proxy_set_header X-Forwarded-Host $http_host;
      proxy_cache_bypass $http_upgrade;

and delete the directive:

proxy_set_header Host $host;

I added them but I am still getting the same error. I don't know if it is of any help but this is what I added to the docker-compose.yml file:

# Relay
  relay:
    image: netbirdio/relay:latest
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=netbird.mydomain.com:443
    - NB_AUTH_SECRET=<MYSECRET>
    ports:
      - 127.0.0.1:33080:33080
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

And this is what I added to management.json:

"Relay": {
        "Addresses": ["rel://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

MDMeridio001 avatar Sep 10 '24 14:09 MDMeridio001

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

mlsmaycon avatar Sep 10 '24 14:09 mlsmaycon

@MDMeridio001 it seems like you are using nginx for SSL termination too, in that case, try this:

    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443

and

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443"],
        "CredentialsTTL": "24h",
        "Secret": "<MYSECRET>"
},

I completely forgot I needed to add "rels://", thank you so much, it's working fine now.

MDMeridio001 avatar Sep 10 '24 14:09 MDMeridio001

Assuming a brand new deployment and all clients running 0.29+ where does coturn fit in the picture ? Can we just run coturn with --stun-only if retrocompability is no concern ?

rgdev avatar Sep 10 '24 15:09 rgdev

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

mlsmaycon avatar Sep 10 '24 16:09 mlsmaycon

@rgdev With a new deployment, it is very likely that Coturn will only be used with mobile clients until we update them.

Excuse my confusion, but since you say that you still use STUN for peer discovery, and at the same time Coturn won’t be used when the mobile apps are updated. Does that mean that the STUN service is baked into the new Relay now (or the management service) ? (Would we be ultimately able to remove Coturn from docker compose and the management.json ?) Thank you very much for this new implementation it sounds cool and production friendly

Roeda avatar Sep 10 '24 17:09 Roeda

For new relay to work is there any new openwrt package released?

We will look into updating the openwrt version.

I am updating netbird package against openwrt snapshot for months, and I have no problem so far, in fact I have built the new version 0.29.0 and is working fine, and open a PR https://github.com/openwrt/packages/pull/24950, for now I just see the error 2024-09-10T15:25:09-03:00 INFO [peer: [ REDACTED ]=] client/internal/peer/worker_relay.go:59: Relay is not supported by remote peer, probably because I'm not selfhosting, and from release notes:

  • Cloud support for the new relay feature is coming soon*.

But I'm not backporting to openwrt 23.05, one of my targets is supported only on openwrt snapshot.

And to be honest someone open a issue https://github.com/openwrt/packages/issues/24569#issuecomment-2246451384 on openwrt repo to backport a new version, I offered my help to the person if he could test it, but I got no response.

wehagy avatar Sep 10 '24 18:09 wehagy

@mlsmaycon Thanks for the explanation. Do you think about optimization on the api side? The api slows down after 200 peers connected to the system. After 500 peers, it slows down a lot. Each request takes more than 1-2 seconds.

In the test measurements I made, these are the response times returned from the api according to the number of peers connected to the system.

20 Peers: 200-300 ms 100 Peers 300-600 ms 200 Peers: 500-1000 ms 500 Peers: 1500-3000 ms

ismail0234 avatar Sep 10 '24 20:09 ismail0234

Hey folks, we have a new release, 0.29.1. This release improves the relay with better authentication messages. To ensure your system is working properly, you should upgrade your relay and management servers before upgrading your clients.

mlsmaycon avatar Sep 11 '24 18:09 mlsmaycon

Works like a charm, thanks!

allroundtechie avatar Sep 11 '24 19:09 allroundtechie

Thanks for improving the relay functionality. I can't find the relay repo in netbirdio github. Will it be private or closed source?

marcportabellaclotet-mt avatar Sep 11 '24 20:09 marcportabellaclotet-mt

@marcportabellaclotet-mt

https://github.com/netbirdio/netbird/tree/main/relay

allroundtechie avatar Sep 11 '24 20:09 allroundtechie

A short example for traefik which is working fine for me:

docker-compose.yml

relay:
    image: "netbirdio/relay:latest"
    container_name: netbird-relay
    restart: unless-stopped
    env_file:
      - relay.env
      - common.env
    labels:
      traefik.enable: 'true'
      traefik.http.routers.netbird-relay.rule: 'Host("netbird.mydomain.com") && PathPrefix("/relay")'
      traefik.http.routers.netbird-relay.entrypoints: websecure
      traefik.http.routers.netbird-relay.service: netbird-relay-service
      traefik.http.services.netbird-relay-service.loadbalancer.server.port: 33080

relay.env

NB_LOG_LEVEL=info
NB_LISTEN_ADDRESS=:33080
NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443/relay
NB_AUTH_SECRET=secret

management.json

"Relay": {
        "Addresses": ["rels://netbird.mydomain.com:443/relay"],
        "CredentialsTTL": "24h",
        "Secret": "secret"
    },

ptpu avatar Sep 11 '24 20:09 ptpu

Relay compose file

  relay:
    image: netbirdio/relay:latest
    container_name: netbird_relay
    restart: unless-stopped
    environment:
    - NB_LOG_LEVEL=info
    - NB_LISTEN_ADDRESS=:33080
    - NB_EXPOSED_ADDRESS=rels://netbird.mydomain.com:443
    - NB_AUTH_SECRET=secret
    ports:
      - 33080:33080
    networks:
      - proxynet
    logging:
      driver: "json-file"
      options:
        max-size: "500m"
        max-file: "2"

management.json

   "Relay": {
    "Addresses": ["rels://netbird.mydomain.com:443"],
    "CredentialsTTL": "24h",
    "Secret": "secret"
    },

netbird.subdomain.conf

server {
    listen 443 ssl;
    listen [::]:443 ssl;

    server_name netbird.mydomain.com;

    include /config/nginx/ssl3.conf;

    client_max_body_size 128M;
    client_header_timeout 1d;
    client_body_timeout 1d;

    location / {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_dashboard;
        set $upstream_port 80;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /api {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;
        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto http;
        proxy_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /signalexchange.SignalExchange/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_signal;
        set $upstream_port 80;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /management.ManagementService/ {
        include /config/nginx/proxy.conf;
        include /config/nginx/resolver.conf;

        grpc_read_timeout 1d;
        grpc_send_timeout 1d;
        grpc_socket_keepalive on;

        set $upstream_app netbird_management;
        set $upstream_port 443;
        set $upstream_proto grpc;
        grpc_pass $upstream_proto://$upstream_app:$upstream_port;

    }

    location /relay/ {
        proxy_pass http://netbird_relay:33080/relay;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        
        # Forward headers
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout settings
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;
        proxy_connect_timeout 60s;

        # Handle upstream errors
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
    }


}


I use SWAG reverse proxy which just bundles nginx and lets encrypt, my config files are above. I'm trying to add the new relay service. When I fire up my docker client/agent I get this error in the logs for it:

UPDATE: the current relay location I have now works.

pugnobellum avatar Sep 11 '24 20:09 pugnobellum