beszel icon indicating copy to clipboard operation
beszel copied to clipboard

[Bug]: Agents Offline after Hub Update

Open Nicky121359 opened this issue 4 months ago • 10 comments

Component

Hub & Agent

Description

After updating Beszel Hub to 13.0 and 13.1 (Proxmox LXC with no firewall) using "./beszel update", all associated Beszel 13.0 Agents (Windows 11 clients with no firewalls) now show down.

I checked the port connections on both ends, and the hub+clients all show established TCP connections as expected just without data flow. Logs in PocketBase show 'System Down, err: EOF'.

I've tried a fresh install on both ends, including disabling the firewalls as stated above, but I still can't get functionality back. All services have been confirmed running. Any ideas on what else I can troubleshoot?

Expected Behavior

Beszel Hub updating via "./beszel update" should reconnect to existing agents.

Steps to Reproduce

  1. Install Beszel Hub on Proxmox LXC
  2. Install Agent(s) on Windows 11 clients
  3. Update Beszel Hub via "./beszel update"
  4. Confirm functionality of existing agents

Category

Installation

Affected Metrics

Other

OS / Architecture

debian 12 LXC (Hub), Windows 11 LTSC (Agents)

Beszel version

13.0

Installation method

Binary

Configuration


Hub Logs


Agent Logs


Nicky121359 avatar Oct 06 '25 16:10 Nicky121359

Are these agents connected via SSH or WebSocket? Both should work, so I'll look further into it, but if connected via SSH, please try switching an agent to WebSocket by providing HUB_URL and TOKEN (/settings/tokens).

https://beszel.dev/guide/environment-variables#windows

henrygd avatar Oct 06 '25 17:10 henrygd

Thank you for the quick response! Agents are connected via SSH. I'll give the WebSocket switch a try. Just to confirm, the agents will switch to WebSocket automatically if those variables are provided?

Nicky121359 avatar Oct 06 '25 17:10 Nicky121359

Yes, it should use WS if that is properly configured. If not, check the agent logs for what the error was.

henrygd avatar Oct 06 '25 17:10 henrygd

Ok, I attempted to set the variables as such:

nssm set beszel-agent AppEnvironmentExtra "+TOKEN=143a0bd5-......." nssm set beszel-agent AppEnvironmentExtra "+HUB_URL=192.168.100.76"

Both variables returned a successful nssm message, however the service won't start now.

Do the existing "KEY" and "PORT" env vars need to be reset for the WS vars to work? As such:

nssm reset beszel-agent AppEnvironmentExtra "PORT" nssm reset beszel-agent AppEnvironmentExtra "KEY"

Also, something of note...I've reinstalled the agent multiple times to make sure I'm not going crazy, but the agent logs do not populate in C:\Program Data\beszel-agent\logs

I can't seem to find the agent logs anywhere. To me it would point to maybe a bad compile and install, but it worked on the initial install, so I'm not sure it's that.

Nicky121359 avatar Oct 06 '25 18:10 Nicky121359

Additional question

Do you know where nssm writes the AppEnvironmentExtra vars?

When I run nssm get on the variables set, it doesn't return anything. I also don't see them in the windows env vars gui though.

Nicky121359 avatar Oct 06 '25 18:10 Nicky121359

@henrygd I didn't want to open a new issue just to ask:

Is there a way to get agents to never use ssh anymore, and only use websocket?

Sometimes I find some of my agents are "falling back" to ssh (I think this happens when I reboot my Traefik reverse proxy). All I have to do is run service beszel-agent restart and they start using WS again. But it would be nice if they would instead just periodically retry WS automatically, or I could force them not to fall back to SSH.

(using v0.15.0 btw)

luckman212 avatar Oct 26 '25 21:10 luckman212

@luckman212 The best way is probably to close the port on the agent side so the SSH connection is blocked. Or change the port the agent is listening on.

Alternatively, you can change the "HOST / IP" value of the system to something like /tmp/beszel.sock. However this will also be displayed in the UI instead of the real host.

henrygd avatar Oct 26 '25 22:10 henrygd

Thanks @henrygd

none of those options sound "clean" to me. I suppose closing the port on the agent might be the cleanest, but gets unwieldy after a couple of dozen agents.

What about blocking outbound SSH connections from the hub itself? I could do that easily with a firewall rule. Do you think that would work?

luckman212 avatar Oct 27 '25 00:10 luckman212

I ended up coming up with a small OpenRC init file (my Docker host is Alpine) that blocks outbound TCP 45876 only from the Hub container using iptables. It seems to work well enough. Will keep testing though.

luckman212 avatar Oct 28 '25 12:10 luckman212

Just circling back to update that the workaround using iptables to block the outbound SSH connections from the Hub seems to be working well enough. All of my agants are consistently using websocket (push) connections.

Can we please keep the issue open though? It would be ideal if this workaround wasn't necessary, via an env var such as DISABLE_OUTBOUND_SSH=true. Perhaps the issue title should be renamed to reflect this.

luckman212 avatar Nov 09 '25 16:11 luckman212