boundary icon indicating copy to clipboard operation
boundary copied to clipboard

Boundary v0.14 - error fetching connection to send session teardown request

Open rfc2119 opened this issue 6 months ago • 0 comments

Describe the bug At random points in time, all clients (Desktop and CLI) already connected to the Kubernetes worker are unable to connect. After 5 minutes or so, the connection comes back and clients are able to connect to the requested targets. This happens for all targets and clients.

On the Desktop client, I can't seem to find any relevant logs. On the CLI client, here are the logs:

error fetching connection to send session teardown request to worker: Error dialing the worker: failed to WebSocket dial: failed to send handshake request: Get "http://public-k8s-worker:9202/v1/proxy": context deadline exceeded

This keeps repeating until the connection/session comes back online. Polling /worker-info on the Kubernetes worker yields "READY" for GRPC upstream connection state:

{
  "worker_process_info": {
    "state": "active",
    "active_session_count": 14,
    "session_connections": {
      "s_9MntRSIZK6": 15,
      "s_AJCsJ4ELib": 16,
      "s_F0N0pBIorY": 11,
      "s_U8d9dLbKlx": 3,
      "s_Xga4gGHkFK": 11,
      "s_Zgtv1w90w6": 47,
      "s_afFHlpg6H7": 16,
      "s_ap8gJvRmlv": 8,
      "s_c0CL0p1SPO": 8,
      "s_dae6ijxKrP": 25,
      "s_geWN5MEe9k": 10,
      "s_r0L1lfVe54": 13,
      "s_re1VmSCNjf": 2,
      "s_z2BOSqwAs6": 2
    },
    "upstream_connection_state": "READY"
  }
}

To Reproduce This issue happens randomly with no known interval. The frequency of the issue is not uniform.

Expected behavior All sessions opened should not be interrupted

Additional context Worker version: v0.15.4 Controller version: v0.14.5 CLI version: 0.14.3 3 controllers in HA setup and one Kubernetes worker

rfc2119 avatar Jul 29 '24 11:07 rfc2119