headscale icon indicating copy to clipboard operation
headscale copied to clipboard

[Bug] CLI could not connect to a server

Open YouSysAdmin opened this issue 9 months ago • 7 comments

Is this a support request?

  • [x] This is not a support request

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

headscale nodes list

2025-03-12T13:56:11+02:00 FTL ../../../../../../home/runner/work/headscale/headscale/cmd/headscale/cli/utils.go:124 > Could not connect: context deadline exceeded error="context deadline exceeded"

Expected Behavior

list of nodes

Steps To Reproduce

  1. touch ~/.headscale/config.yaml
export HEADSCALE_CLI_API_KEY=************
export HEADSCALE_CLI_ADDRESS=access*****:443 
  1. execute headscale nodes list

  2. Check GRPC is working correct

grpcurl -H "authorization: Bearer ${HEADSCALE_CLI_API_KEY}" "${HEADSCALE_CLI_ADDRESS}" 'headscale.v1.HeadscaleService.ListNodes'
"nodes": [
    {
      "id": "5",
      "machineKey": "mkey:**********",
      "nodeKey": "nodekey:**********",
      "discoKey": "discokey:**********",
      "ipAddresses": [
        "100.64.0.2",
        "fd7a:115c:a1e0::2"
      ],
.....

Environment

- OS: Server Kubernetes (used official image) / Client MacOS ARM64
- Headscale Server: 0.25.1 / Client 0.25.1
- Tailscale version:

Runtime environment

  • [x] Headscale is behind a (reverse) proxy
  • [x] Headscale runs in a container

Anything else?

Only remote CLI is affected, all other functions work correctly.

Update: I tested older versions and the latest working version is 0.23.0, connects and possible to set policy.

YouSysAdmin avatar Mar 12 '25 12:03 YouSysAdmin

I tested the remote-cli with 0.25.1 as described in the docs (without reverse proxy or container) and it works.

local config testing.yml:

cli:
  address: headscale.example.com:50443
  api_key: rS-0soL.8OfdRblablablbablablblazYX5kd

Invocation: ./headscale_0.25.1_linux_amd64 -c testing.yml user list

Please note that the address has to be configured without http:// or https://. Can you please check your configuration again?

nblock avatar Mar 15 '25 07:03 nblock

Hi @nblock The same result if using an ~/.headscale/config.yaml file. All config variables set are correctly, judding by a trace output and additional debug outputs (inside the utils/newHeadscaleCLIWithConfig function).

I have compiled a conditionally working version via downgrade versions of some packages (haven't tested all the CLI functions)

diff --git a/go.mod b/go.mod
index ecf94318..d75df6da 100644
--- a/go.mod
+++ b/go.mod
@@ -16,7 +16,7 @@ require (
        github.com/google/go-cmp v0.6.0
        github.com/gorilla/mux v1.8.1
        github.com/grpc-ecosystem/go-grpc-middleware v1.4.0
-       github.com/grpc-ecosystem/grpc-gateway/v2 v2.24.0
+       github.com/grpc-ecosystem/grpc-gateway/v2 v2.22.0
        github.com/jagottsicher/termcolor v1.0.2
        github.com/klauspost/compress v1.17.11
        github.com/oauth2-proxy/mockoidc v0.0.0-20240214162133-caebfff84d25
@@ -42,8 +42,8 @@ require (
        golang.org/x/net v0.34.0
        golang.org/x/oauth2 v0.25.0
        golang.org/x/sync v0.10.0
-       google.golang.org/genproto/googleapis/api v0.0.0-20241216192217-9240e9c98484
-       google.golang.org/grpc v1.69.0
+       google.golang.org/genproto/googleapis/api v0.0.0-20240903143218-8af14fe29dc1
+       google.golang.org/grpc v1.66.0
        google.golang.org/protobuf v1.36.0
        gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c
        gopkg.in/yaml.v3 v3.0.1

routes list output:

> go run cmd/headscale/headscale.go -c ~/.headscale/config.yaml routes list
2025-03-15T14:23:14+02:00 DBG Setting timeout timeout=5000
2025-03-15T14:23:14+02:00 TRC cmd/headscale/cli/utils.go:121 > Connecting via gRPC address=my.server.example:443
ID  | Node                                  | Prefix           | Advertised | Enabled | Primary
209 | vpn-router                       | 10.1.4.0/24    | true       | true    | true
210 | vpn-router                        | 10.1.0.0/16     | true       | true    | true
211 | vpn-router                         | 10.2.0.0/16     | true       | true    | true
212 | vpn-router                        | 10.30.0/16     | true       | true    | true
graph TD
    A[Traefik 443 headscale.example.com] --> C[8080]
    B[Traefik 443 grpc-headscale.example.com]--> D[50443 h2c]
    C --> E
    D --> E
    E(headscale container HTTP 8080, GRPC 50443)

YouSysAdmin avatar Mar 15 '25 12:03 YouSysAdmin

Please try without a reverse proxy in between.

nblock avatar Mar 15 '25 12:03 nblock

@nblock This headscale instance running inside kubernetes cluster end external connection possible only via traefik.

I used kubectl port-forward for forwarding the port 50433 to my local machine and try again, it not working for any CLI version (doesn't matter configuration via CLI or file)

❯ HEADSCALE_CLI_INSECURE=1 HEADSCALE_CLI_ADDRESS=127.0.0.1:50443 headscale node list
2025-03-16T12:23:58+02:00 FTL ../../../home/runner/work/headscale/headscale/cmd/headscale/cli/utils.go:124 > Could not connect: context deadline exceeded error="context deadline exceeded"

GRPCURL work fine

grpcurl -plaintext -H 'authorization: Bearer TOKEN' 127.0.0.1:50443 headscale.v1.HeadscaleService.GetRoutes
headscale server config
server_url: https://access.example.com
listen_addr: 0.0.0.0:8080
grpc_listen_addr: 0.0.0.0:50443
grpc_allow_insecure: true

YouSysAdmin avatar Mar 16 '25 10:03 YouSysAdmin

Same here, I'm behind Traefik but have Tailscale running on my nodes so CAN use that perfectly.

This is my config ...

$ (mbp-linux) cat .headscale/config.yaml
cli:
    address: headscale.mydomain.uk:443
    api_key: xxxxxxxxxxxxxxxxxxxxxxxxxxLtSX9_hWC-sz

I'm getting this error ...

$ (mbp-linux) headscale nodes list
Cannot get nodes: unexpected HTTP status code received from server: 404 (Not Found); malformed header: missing HTTP content-type

I have tried swapping the URL for a Tailscale IP and port that I can telnet to ...

$ (mbp-linux) telnet 100.64.0.4 8080
Trying 100.64.0.4...
Connected to 100.64.0.4.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

... and put this in my config ...

$ (mbp-linux) cat .headscale/config.yaml
cli:
    address: 100.64.0.4:8080
    api_key: xxxxxxxxxxxxxxxxxxxxxxxxxxLtSX9_hWC-sz

... and this time I get a different error ...

$ (mbp-linux) headscale nodes list
2025-05-23T10:58:58+01:00 FTL ../runner/work/headscale/headscale/cmd/headscale/cli/utils.go:124 > Could not connect: context deadline exceeded error="context deadline exceeded"

Any clue?

FYI I have headscale-admin working fine on Traefik.

Thanks,

Paully

plittlefield avatar May 23 '25 10:05 plittlefield

Hi @plittlefield I don't have any solution for it. I just use v0.23 to set a policy, etc., and it works fine. :)

There may be some dependencies on headers that are not passed through Traefik to headscale, but I haven't had time to conduct this research across the two versions of the GRPC package.

P.S. I haven't tested version 0.26 yet.

YouSysAdmin avatar May 23 '25 10:05 YouSysAdmin

I am using docker based letsencrypt + haproxy + headscale(v0.26.1) implementation and macos client works fine for me. I am using normal tailscale pkg from their official web site(Tailscale-1.84.1-macos.pkg). My headscale config on haproxy is simple nothing fancy backend config is like

backend headscale_backend
    mode http
    server headscale_server 172.17.0.1:8080 check

ozhankaraman avatar Jun 08 '25 12:06 ozhankaraman

@ozhankaraman the question is not about the client, here the question is only about the remote CLI of Headscale (GRPC) :)

I have no idea what the problem is here.

Headscale and CLI v0.26.1 (get/set policy):

  • Docker compose + Traefik - no problem
  • Kubernetes + Traefik - Could not connect: context deadline exceeded error="context deadline exceeded"
  • GRPCURL - no problem

Headscale v0.26.1 + CLI v0.23.0 (get/set policy):

  • Docker compose + Traefik - no problem
  • Kubernetes + Traefik - no problem
  • GRPCURL - no problem

This definitely happened after updating the version of the GRPC library, but I still haven't found the reason and how to fix it. Interesting is that this problem with GRPC only occurs for Heascale and it's not clear why at all, i have a many tools based on GRPC. Need to dig changes after google.golang.org/grpc v1.66.0

# WWW -> AWS NLB -> Kube Node -> Traefik -> SVC -> POD
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
  name: ingress-route
spec:
  routes:
    - kind: Rule
      match: Host(`access.example.co`)
      middlewares:
        - name: headscale-cors-middleware
          namespace: headscale
      priority: 10
      services:
        - kind: Service
          name: headscale-svc
          port: 8080
          scheme: http
    - kind: Rule
      match: Host(`access-grpc.example.co`)
      priority: 10
      services:
        - kind: Service
          name: headscale-svc
          port: 50443
          scheme: h2c
          passHostHeader: true

YouSysAdmin avatar Jul 29 '25 15:07 YouSysAdmin

Hi @nblock I found problem, Falling bombs with five hundred kilograms of TNT clear my brain well :D

~~As a temporary solution, we can use the following:~~ ~~GRPC_ENFORCE_ALPN_ENABLED=false headscale get policy [etc]~~ this is related to enforce ALPN protocol: https://github.com/grpc/grpc-go/issues/434

This is actually a misconfiguration of the AWS Network Load Balancer.

Image

fix:

  1. In the main navigation panel, under Load Balancing, choose Load Balancers.
  2. Click inside the Filter by tags and attributes or search by keyword box, select Type and choose network to list the Network Load Balancers available in the current AWS region.
  3. Select the Network Load Balancer (NLB) that you want to examine.
  4. Select the Listeners tab from the console bottom panel to access the load balancer listeners.
  5. Select the TLS : 443 listener and choose Edit to access the TLS listener configuration.
  6. In the Listener details section, check the name of the policy selected for ALPN Policy. If there is no TLS ALPN policy configured for the selected listener and the ALPN Policy is set to None, change this option to the HTTP2Preferred.

If using HTTP2Preferred is not possible for you, you can use an additional environment variable for the client. GRPC_ENFORCE_ALPN_ENABLED=false headscale get policy [etc]

YouSysAdmin avatar Jul 29 '25 16:07 YouSysAdmin