[Bug]: Agents Offline after Hub Update
Component
Hub & Agent
Description
After updating Beszel Hub to 13.0 and 13.1 (Proxmox LXC with no firewall) using "./beszel update", all associated Beszel 13.0 Agents (Windows 11 clients with no firewalls) now show down.
I checked the port connections on both ends, and the hub+clients all show established TCP connections as expected just without data flow. Logs in PocketBase show 'System Down, err: EOF'.
I've tried a fresh install on both ends, including disabling the firewalls as stated above, but I still can't get functionality back. All services have been confirmed running. Any ideas on what else I can troubleshoot?
Expected Behavior
Beszel Hub updating via "./beszel update" should reconnect to existing agents.
Steps to Reproduce
- Install Beszel Hub on Proxmox LXC
- Install Agent(s) on Windows 11 clients
- Update Beszel Hub via "./beszel update"
- Confirm functionality of existing agents
Category
Installation
Affected Metrics
Other
OS / Architecture
debian 12 LXC (Hub), Windows 11 LTSC (Agents)
Beszel version
13.0
Installation method
Binary
Configuration
Hub Logs
Agent Logs
Are these agents connected via SSH or WebSocket? Both should work, so I'll look further into it, but if connected via SSH, please try switching an agent to WebSocket by providing HUB_URL and TOKEN (/settings/tokens).
https://beszel.dev/guide/environment-variables#windows
Thank you for the quick response! Agents are connected via SSH. I'll give the WebSocket switch a try. Just to confirm, the agents will switch to WebSocket automatically if those variables are provided?
Yes, it should use WS if that is properly configured. If not, check the agent logs for what the error was.
Ok, I attempted to set the variables as such:
nssm set beszel-agent AppEnvironmentExtra "+TOKEN=143a0bd5-......." nssm set beszel-agent AppEnvironmentExtra "+HUB_URL=192.168.100.76"
Both variables returned a successful nssm message, however the service won't start now.
Do the existing "KEY" and "PORT" env vars need to be reset for the WS vars to work? As such:
nssm reset beszel-agent AppEnvironmentExtra "PORT" nssm reset beszel-agent AppEnvironmentExtra "KEY"
Also, something of note...I've reinstalled the agent multiple times to make sure I'm not going crazy, but the agent logs do not populate in C:\Program Data\beszel-agent\logs
I can't seem to find the agent logs anywhere. To me it would point to maybe a bad compile and install, but it worked on the initial install, so I'm not sure it's that.
Additional question
Do you know where nssm writes the AppEnvironmentExtra vars?
When I run nssm get on the variables set, it doesn't return anything. I also don't see them in the windows env vars gui though.
@henrygd I didn't want to open a new issue just to ask:
Is there a way to get agents to never use ssh anymore, and only use websocket?
Sometimes I find some of my agents are "falling back" to ssh (I think this happens when I reboot my Traefik reverse proxy). All I have to do is run service beszel-agent restart and they start using WS again. But it would be nice if they would instead just periodically retry WS automatically, or I could force them not to fall back to SSH.
(using v0.15.0 btw)
@luckman212 The best way is probably to close the port on the agent side so the SSH connection is blocked. Or change the port the agent is listening on.
Alternatively, you can change the "HOST / IP" value of the system to something like /tmp/beszel.sock. However this will also be displayed in the UI instead of the real host.
Thanks @henrygd
none of those options sound "clean" to me. I suppose closing the port on the agent might be the cleanest, but gets unwieldy after a couple of dozen agents.
What about blocking outbound SSH connections from the hub itself? I could do that easily with a firewall rule. Do you think that would work?
I ended up coming up with a small OpenRC init file (my Docker host is Alpine) that blocks outbound TCP 45876 only from the Hub container using iptables. It seems to work well enough. Will keep testing though.
Just circling back to update that the workaround using iptables to block the outbound SSH connections from the Hub seems to be working well enough. All of my agants are consistently using websocket (push) connections.
Can we please keep the issue open though? It would be ideal if this workaround wasn't necessary, via an env var such as DISABLE_OUTBOUND_SSH=true. Perhaps the issue title should be renamed to reflect this.