[Bug]: Brand new installation of v0.21.0 results in a network of hosts that cannot talk to each other, except the Netmaker default host.
Contact Details
What happened?
A brand new installation of Netmaker v0.21.0 and netclients v0.21.0 on a network known to be working with Netmaker/netclient v0.20.5 and prior, now no longer works.
Immediate sign is that the Netmaker UI shows quite persistent "error reaching broker" errors immediately after installation. Magically this seems to get resolved after some time left alone, which is immediately very suspicious. I was able to proceed to configure a network, and quickly verify that the hosts on the network were able to ping each other.
After a couple of hours idle, the system automatically went nuts and the hosts were no longer able to ping or communicate with each other, except between the Netmaker host and each host on the network. The TURN server logs show a complete failure in TURN. Overnight, this generated about 1G of logs. Ouch.
I'm not sure why the architecture is going towards STUN, TURN, etc. - but far from improving connectivity, Netmaker is becoming much less stable and performant. This seems like a major step backwards not a step forwards. Something is quite broken here.
Version
v0.21.0
What OS are you using?
No response
Relevant log output
docker logs turn
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:55070: no allocation found 206.174.182.90:55070:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35968: no allocation found 206.174.182.90:35968:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35968: no allocation found 206.174.182.90:35968:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:45803: no allocation found 206.174.182.90:45803:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:41533: no allocation found 206.174.182.90:41533:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35087: no allocation found 206.174.182.90:35087:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:41533: no allocation found 206.174.182.90:41533:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:49227: no allocation found 206.174.182.90:49227:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:56555: no allocation found 206.174.182.90:56555:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:35780: no allocation found 206.174.182.90:35780:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:52578: no allocation found 206.174.182.90:52578:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle Send-indication from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:42 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:53242: no allocation found 206.174.182.90:53242:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:60311: no allocation found 206.174.182.90:60311:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:33754: no allocation found 206.174.182.90:33754:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle Send-indication from 206.174.182.90:35780: no allocation found 206.174.182.90:35780:[::]:3479
turn ERROR: 2023/09/19 19:34:43 error when handling datagram: failed to handle ChannelBind-request from 206.174.182.90:43463: no allocation found 206.174.182.90:43463:[::]:3479
Contributing guidelines
- [X] Yes, I did.
If you reboot your server, do the log entries stop spaming? That's what happened on my VPS.
If you reboot your server, do the log entries stop spaming? That's what happened on my VPS.
No, unfortunately.
Reverting the TURN server from 1.1 to 1.0 makes the system functional. The installation from scratch also completes without errors from the UI that the broker is unreachable.
However, the TURN server still shows lots of the same spamming.
@Aeoran
Reverting the TURN server from 1.1 to 1.0 makes the system functional. The installation from scratch also completes without errors from the UI that the broker is unreachable.
However, the TURN server still shows lots of the same spamming.
Hi, I have the same issue with turn docker image: gravitl/turnserver:v1.0.0 the same spamming and hosts that cannot talk to each other, except the Netmaker default host.
Could you please share your docker-compose.yml ?
`version: "3.4"
services:
netmaker: container_name: netmaker image: gravitl/netmaker:$SERVER_IMAGE_TAG env_file: ./netmaker.env restart: always volumes: - dnsconfig:/root/config/dnsconfig - sqldata:/root/data environment: # config-dependant vars - STUN_LIST=stun1.netmaker.io:3478,stun2.netmaker.io:3478,stun1.l.google.com:19302,stun2.l.google.com:19302 # The domain/host IP indicating the mq broker address - BROKER_ENDPOINT=wss://broker.${NM_DOMAIN} # The base domain of netmaker - SERVER_NAME=${NM_DOMAIN} - SERVER_API_CONN_STRING=api.${NM_DOMAIN}:443 # Address of the CoreDNS server. Defaults to SERVER_HOST - COREDNS_ADDR=${SERVER_HOST} # Overrides SERVER_HOST if set. Useful for making HTTP available via different interfaces/networks. - SERVER_HTTP_HOST=api.${NM_DOMAIN} # domain for your turn server - TURN_SERVER_HOST=turn.${NM_DOMAIN} # domain of the turn api server - TURN_SERVER_API_HOST=https://turnapi.${NM_DOMAIN}
netmaker-ui: container_name: netmaker-ui image: gravitl/netmaker-ui:$UI_IMAGE_TAG env_file: ./netmaker.env environment: # config-dependant vars # URL where UI will send API requests. Change based on SERVER_HOST, SERVER_HTTP_HOST, and API_PORT BACKEND_URL: "https://api.${NM_DOMAIN}" depends_on: - netmaker links: - "netmaker:api" restart: always
caddy: image: caddy:2.6.2 container_name: caddy env_file: ./netmaker.env restart: unless-stopped extra_hosts: - "host.docker.internal:host-gateway" volumes: - ./Caddyfile:/etc/caddy/Caddyfile - ./certs:/root/certs - caddy_data:/data - caddy_conf:/config ports: - "80:80" - "443:443"
coredns: container_name: coredns image: coredns/coredns command: -conf /root/dnsconfig/Corefile env_file: ./netmaker.env depends_on: - netmaker restart: always volumes: - dnsconfig:/root/dnsconfig mq: container_name: mq image: eclipse-mosquitto:2.0.15-openssl env_file: ./netmaker.env depends_on: - netmaker restart: unless-stopped command: [ "/mosquitto/config/wait.sh" ] volumes: - ./mosquitto.conf:/mosquitto/config/mosquitto.conf - ./wait.sh:/mosquitto/config/wait.sh - mosquitto_logs:/mosquitto/log - mosquitto_data:/mosquitto/data container_name: turn image: gravitl/turnserver:v1.0.0 env_file: ./netmaker.env environment: # config-dependant vars - USERNAME=${TURN_USERNAME} - PASSWORD=${TURN_PASSWORD} # domain for your turn server - TURN_SERVER_HOST=turn.${NM_DOMAIN} network_mode: "host" volumes: - turn_server:/etc/config restart: always
volumes: caddy_data: { } # runtime data for caddy caddy_conf: { } # configuration file for Caddy sqldata: { } dnsconfig: { } # storage for coredns mosquitto_logs: { } # storage for mqtt logs mosquitto_data: { } # storage for mqtt data turn_server: { } `
Had this issue also. Deleted all my docker containers and reran the script. It does work now
Which containers are you using? Newer than 0.21.0?
-- Joseph
On Oct. 19, 2023, 23:25, at 23:25, Tom Teck @.***> wrote:
Had this issue also. Deleted all my docker containers and reran the script. It does work now
-- Reply to this email directly or view it on GitHub: https://github.com/gravitl/netmaker/issues/2594#issuecomment-1772018050 You are receiving this because you were mentioned.
Message ID: @.***>
I think the default docker-compose.yml uses wrong path for mounting Turn container files. This means that there's no data persistence and creating new container wipes history. Which results in the logs that you observe.
Try changing:
volumes:
- turn_server:/etc/config
To:
volumes:
- turn_server:/root/etc/turn/config
Unfortunately, old data is already lost, so only newly connected clients will benefit from this.