MeshAgent icon indicating copy to clipboard operation
MeshAgent copied to clipboard

Linux MeshAgent Continuous reconnection

Open Coolguy3289 opened this issue 11 months ago • 16 comments

On a Linux Mint 20.1 desktop, with MeshAgent installed, the workstation has registered to the MeshCentral server, but I was seeing agent disconnections on the server GUI. When running the meshagent manually I'm seeing the following over and over again, and I'm looking for some insight to this error and what can be done to fix this:

Connecting to: wss://meshcentral.domain.com:443/agent.ashx
Connected.
Server verified meshcore... Launching meshcore...
Mesh Server Connection Error [13]
Connecting to: wss://meshcentral.domain.com:443/agent.ashx
Connected.
Server verified meshcore... meshcore already running...
Mesh Server Connection Error [13]
Connecting to: wss://meshcentral.domain.com:443/agent.ashx
^CConnected.

Coolguy3289 avatar Jan 24 '25 21:01 Coolguy3289

what version of meshcentral are you running what nodejs version what database are you using do you use a reverse proxy or direct ssl access is your certificate self-signed or valid/trusted for the web site? what does your config.json look like (hide secret info like passwords/domain names)

si458 avatar Jan 24 '25 21:01 si458

what version of meshcentral are you running

MeshCentral v1.1.38, Hybrid (LAN + WAN) mode, Production mode.

what nodejs version

v20.16.0

what database are you using

MongoDB

do you use a reverse proxy or direct ssl access

Nginx Reverse Proxy

is your certificate self-signed or valid/trusted for the web site?

Valid/Trusted SSL Cert via Nginx Reverse Proxy

what does your config.json look like (hide secret info like passwords/domain names)

{
  "$schema": "https://raw.githubusercontent.com/Ylianst/MeshCentral/master/meshcentral-config-schema.json",
  "__comment1__": "This is a simple configuration file, all values and sections that start with underscore (_) are ignored. Edit a section and remove the _ in front of the name. Refer to the user's guide for details.",
  "__comment2__": "See node_modules/meshcentral/sample-config-advanced.json for a more advanced example.",
  "settings": {
    "cert": "mc.domain.com",
    "MongoDb": "mongodb://127.0.0.1:27017/meshcentral",
    "_WANonly": true,
    "_LANonly": true,
    "port": 4430,
    "aliasPort": 443,
    "redirPort": 8000,
    "redirAliasPort": 443,
    "tlsOffload": "127.0.0.1",
    "trustedProxy": "127.0.0.1"
  },
  "domains": {
    "": {
      "title": "Remote Access",
      "title2": "Mesh",
      "_minify": true,
      "newAccounts": false,
      "_userNameIsEmail": true,
      "certUrl": "https://127.0.0.1",
      "authStrategies": {
        "azure": {
          "newAccounts": true,
          "clientid": "OMITTED",
          "clientsecret": "OMITTED",
          "tenantid": "OMITTED"
        }
      }
    }
  },
  "_letsencrypt": {
    "__comment__": "Requires NodeJS 8.x or better, Go to https://letsdebug.net/ first before trying Let's Encrypt.",
    "email": "[email protected]",
    "names": "myserver.mydomain.com",
    "skipChallengeVerification": true,
    "production": false
  }
}

Coolguy3289 avatar Jan 30 '25 20:01 Coolguy3289

It's also important to note that this doesn't seem to be happening to every client, only specific ones such as the client with the info I mentioned in the original issue.

Coolguy3289 avatar Jan 30 '25 21:01 Coolguy3289

Don't mean to pester, but is there any further diagnostics I can do on this to help narrow down why this is happening?

Coolguy3289 avatar Feb 24 '25 15:02 Coolguy3289

ok few things to try

  1. certurl should really be the URL of your domain you use to access meshcentral https://my.domain.com reason for this is traefik for example, access https://192.168.1.69 provides a default traefik SSL, BUT this SSL ISNT the same SSL that your meshagents see https://my.domain.com
  2. you should add agentPong: 15 to settings, this send a ping every 15 seconds to keep the websocket open but really you should set the websocket timeout on your reverse proxy https://ylianst.github.io/MeshCentral/meshcentral/#nginx-reverse-proxy-setup
# MeshCentral uses long standing web socket connections, set longer timeouts.
proxy_send_timeout 330s;
proxy_read_timeout 330s;
  1. redirAliasPort should be 80 and not 443 as this is a redirect port from HTTP to HTTPS

si458 avatar Mar 18 '25 19:03 si458

Alright, so I adjusted the certurl, added the agentPong to the config and corrected the redirAliasPort. My nginx config was already setup like it was stated in the documentation. Unfortunately, no improvement.

However When I tail the journal on the MeshCentral server, I am getting TONS of these errors, and I'm wondering if this is why the websocket connection is closing and reconnecting constantly on certain clients:

Mar 18 19:13:06 chi-int-meshcentral node[1720198]: Error: Invalid WebSocket frame: invalid UTF-8 sequence
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Receiver.dataMessage (/root/node_modules/express-ws/node_modules/ws/lib/receiver.js:508:18)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Receiver.getData (/root/node_modules/express-ws/node_modules/ws/lib/receiver.js:435:17)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Receiver.startLoop (/root/node_modules/express-ws/node_modules/ws/lib/receiver.js:143:22)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Receiver._write (/root/node_modules/express-ws/node_modules/ws/lib/receiver.js:78:10)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at writeOrBuffer (node:internal/streams/writable:570:12)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at _write (node:internal/streams/writable:499:10)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Writable.write (node:internal/streams/writable:508:10)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Socket.socketOnData (/root/node_modules/express-ws/node_modules/ws/lib/websocket.js:1164:35)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at Socket.emit (node:events:519:28)
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:     at addChunk (node:internal/streams/readable:559:12) {
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:   code: 'WS_ERR_INVALID_UTF8',
Mar 18 19:13:06 chi-int-meshcentral node[1720198]:   [Symbol(status-code)]: 1007
Mar 18 19:13:06 chi-int-meshcentral node[1720198]: }
Mar 18 19:13:06 chi-int-meshcentral node[1720198]: AGENT WSERR: Error: Invalid WebSocket frame: invalid UTF-8 sequence

Coolguy3289 avatar Mar 18 '25 19:03 Coolguy3289

@Coolguy3289 YES! that will defo be whats wrong i did find this post - https://github.com/websockets/ws/issues/2252 claims maybe a node issue? can you try upgrading to latest LTS version which is node 22.14.0 ?

si458 avatar Mar 18 '25 19:03 si458

I'm getting the same messages, and seeing the same agent behavior after updating to v22.14.0. I also reinstalled the agent again just to be safe.

Coolguy3289 avatar Mar 18 '25 19:03 Coolguy3289

ok so now it must be a nginx proxy issue, what does ur config for that look like? are u using nginx proxy manager or normal nginx?

si458 avatar Mar 18 '25 19:03 si458

Normal nginx, however again I must note this is only happening with select linux clients only. I don't see it on all linux clients, and windows clients also seem fine. nginx config below:

server {
        listen 80;
        listen [::]:80;
        server_name meshcentral.domain.com;
        if ($host = meshcentral.domain.com)
        {
                return 301 https://$host$request_uri;
        }
        location / {
                proxy_pass http://localhost:8000;
                proxy_http_version 1.1;

                proxy_set_header X-Forwarded-Host $host:$server_port;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
        }
}

server {
        listen 443 ssl;
        listen [::]:443 ssl;

        server_name meshcentral.domain.com;
        ssl_certificate /etc/ssl/astro/ssl_fullchain.pem;
        ssl_certificate_key /etc/ssl/astro/STAR-key.pem;
        #ssl_certificate /etc/ssl/astro/ssl_fullchain.pem;
        ssl_session_cache shared:WEBSSL:10m;
        ssl_session_timeout 999999s;
        ssl_ciphers HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers on;

        proxy_send_timeout 330s;
        proxy_read_timeout 330s;

        location / {
                proxy_pass http://localhost:4430;
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_set_header Host $host;
                proxy_set_header CF-Connecting-IP $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Host $host:$server_port;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
        }
}

Coolguy3289 avatar Mar 18 '25 19:03 Coolguy3289

not sure if it matters BUT change proxy_pass http://localhost:4430; to proxy_pass http://localhost:4430/; also remove CF-Connecting-IP then restart nginx

si458 avatar Mar 18 '25 19:03 si458

Changes made, nginx restarted, no change.

Coolguy3289 avatar Mar 18 '25 19:03 Coolguy3289

im literally not sure now? out of ideas? ill have to create vm 2moz and try replicate ur issue

si458 avatar Mar 18 '25 20:03 si458

After through debugging. This error is actually caused by the "board_asset_tag":"���¦" being sent during hardware info collection.

Coolguy3289 avatar Mar 24 '25 14:03 Coolguy3289

@Coolguy3289 look what i just discovered! https://github.com/Ylianst/MeshAgent/issues/141 the same issue as you! and i think also the same type of device too!

si458 avatar Mar 24 '25 22:03 si458

is this still an issue or can it be closed?

si458 avatar Jun 21 '25 10:06 si458