vault [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=

trafficstars

Describe the bug Hello, I trust this message finds you well. Our team is currently encountering the described issue within our three-node Vault High Availability (HA) cluster deployed on Kubernetes. The Vault version in use is v1.15.2. Despite attempts to address the problem by restarting the Vault cluster and attempting to promote a standby node to a leader, the issue persists.

We are reaching out for suggestions and assistance in identifying the root cause of the problem. As this is occurring in our production environment, any insights or recommendations would be greatly appreciated. Thanks

To Reproduce Steps to reproduce the behavior:

Run helm install vault-server hashicorp/vault -f values.yml -n vault-server
Run kubectl -n vault-server exec vault-server-0 -- vault operator init -format=json > cluster-keys.json
Run kubectl -n vault-server exec -ti vault-server-1 -- vault operator raft join http://vault-server-0.vault-server-internal:8200
Run kubectl -n vault-server exec -ti vault-server-2 -- vault operator raft join http://vault-server-0.vault-server-internal:8200
Create kv secret in vault server and inject them inside kubernetes deployment.
See logs of vault nodes below ( [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=) ==> Vault server configuration:

Administrative Namespace: Api Address: http://x.x.x.x:8200 Cgo: disabled Cluster Address: https://vault-server-2.vault-server-internal:8201 Environment Variables: GODEBUG, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_PROJECT, GOOGLE_REGION, HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, VAULT_ADDR, VAULT_API_ADDR, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_SECRETS_SERVER_ACTIVE_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_HOST, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT_HTTPS_INTERNAL, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_ADDR, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_PROTO, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_HOST, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_PORT_HTTPS, VAULT_SECRETS_SERVER_PORT, VAULT_SECRETS_SERVER_PORT_8200_TCP, VAULT_SECRETS_SERVER_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_PORT_8201_TCP, VAULT_SECRETS_SERVER_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_SERVICE_HOST, VAULT_SECRETS_SERVER_SERVICE_PORT, VAULT_SECRETS_SERVER_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_SERVICE_PORT_HTTPS_INTERNAL, VAULT_SECRETS_SERVER_STANDBY_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_STANDBY_SERVICE_HOST, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT_HTTPS_INTERNAL, VERSION Go Version: go1.21.3 Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled") Log Level: Mlock: supported: true, enabled: false Recovery Mode: false Storage: raft (HA available) Version: Vault v1.15.2, built 2023-11-06T11:33:28Z Version Sha: cf1b5cafa047bc8e4a3f93444fcb4011593b92cb

2024-01-19T17:01:31.979Z [WARN] unknown or unsupported field cluster_addr found in configuration at /tmp/storageconfig.hcl:10:1 2024-01-19T17:01:31.979Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy="" 2024-01-19T17:01:31.981Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rw-rw---- 2024-01-19T17:01:32.928Z [INFO] incrementing seal generation: generation=1 2024-01-19T17:01:32.928Z [INFO] core: Initializing version history cache for core 2024-01-19T17:01:32.928Z [INFO] events: Starting event system 2024-01-19T17:01:32.929Z [INFO] core: stored unseal keys supported, attempting fetch ==> Vault server started! Log data will stream in below:

2024-01-19T17:01:33.002Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201 2024-01-19T17:01:33.002Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201 2024-01-19T17:01:33.003Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"db3afed9-90ea-e8d8-855c-c87d11e87cce", NotifyCh:(chan<- bool)(0xc0035d30a0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc0030201e0), NoSnapshotRestoreOnStart:true, skipStartup:false}" 2024-01-19T17:01:33.006Z [INFO] storage.raft: initial configuration: index=65 servers="[{Suffrage:Voter ID:eecd350a-f2a7-403f-9702-619ba8cebe40 Address:vault-server-0.vault-server-internal:8201} {Suffrage:Voter ID:9b285315-469d-d3c1-be5f-9ba1a41f6760 Address:vault-server-1.vault-server-internal:8201} {Suffrage:Voter ID:db3afed9-90ea-e8d8-855c-c87d11e87cce Address:vault-server-2.vault-server-internal:8201}]" 2024-01-19T17:01:33.006Z [INFO] core: vault is unsealed 2024-01-19T17:01:33.006Z [INFO] core: unsealed with stored key 2024-01-19T17:01:33.006Z [INFO] storage.raft: entering follower state: follower="Node at vault-server-2.vault-server-internal:8201 [Follower]" leader-address= leader-id= 2024-01-19T17:01:33.006Z [INFO] core: entering standby mode 2024-01-19T17:01:48.023Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id= 2024-01-19T17:01:48.023Z [INFO] storage.raft: entering candidate state: node="Node at vault-server-2.vault-server-internal:8201 [Candidate]" term=32 2024-01-19T17:01:48.079Z [INFO] storage.raft: election won: term=32 tally=2 2024-01-19T17:01:48.079Z [INFO] storage.raft: entering leader state: leader="Node at vault-server-2.vault-server-internal:8201 [Leader]" 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=eecd350a-f2a7-403f-9702-619ba8cebe40 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=9b285315-469d-d3c1-be5f-9ba1a41f6760 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer="{Voter 9b285315-469d-d3c1-be5f-9ba1a41f6760 vault-server-1.vault-server-internal:8201}" 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer="{Voter eecd350a-f2a7-403f-9702-619ba8cebe40 vault-server-0.vault-server-internal:8201}" 2024-01-19T17:01:48.095Z [INFO] core: acquired lock, enabling active operation 2024-01-19T17:01:48.116Z [INFO] core: post-unseal setup starting 2024-01-19T17:01:48.138Z [INFO] core: loaded wrapping token key 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin runtime catalog 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin catalog: plugin-directory="" 2024-01-19T17:01:48.171Z [INFO] core: successfully mounted: type=system version="v1.15.2+builtin.vault" path=sys/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=identity version="v1.15.2+builtin.vault" path=identity/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=kv version="v0.16.1+builtin" path=secret/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=cubbyhole version="v1.15.2+builtin.vault" path=cubbyhole/ namespace="ID: root. Path: " 2024-01-19T17:01:48.294Z [INFO] core: successfully mounted: type=token version="v1.15.2+builtin.vault" path=token/ namespace="ID: root. Path: " 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=kubernetes version="v0.17.1+builtin" path=kubernetes/ namespace="ID: root. Path: " 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=userpass version="v1.15.2+builtin.vault" path=userpass/ namespace="ID: root. Path: " 2024-01-19T17:01:48.311Z [INFO] rollback: Starting the rollback manager with 256 workers 2024-01-19T17:01:48.311Z [INFO] rollback: starting rollback manager 2024-01-19T17:01:48.312Z [INFO] core: restoring leases 2024-01-19T17:01:48.394Z [INFO] expiration: lease restore complete 2024-01-19T17:01:48.395Z [INFO] identity: entities restored 2024-01-19T17:01:48.418Z [INFO] identity: groups restored 2024-01-19T17:01:48.445Z [INFO] core: starting raft active node 2024-01-19T17:01:48.445Z [INFO] storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}" reconcile_interval=0s 2024-01-19T17:01:48.470Z [INFO] core: usage gauge collection is disabled 2024-01-19T17:01:48.718Z [INFO] core: post-unseal setup complete 2024-01-19T17:04:23.948Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:14:19.065Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:29:23.960Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:29:23.977Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:34:19.057Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:54:19.054Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[]

Expected behavior There should not be TLS related exception in vault cluster logs.

Environment:

Vault Server Version (retrieve with vault status): / $ vault status Key Value

Recovery Seal Type shamir Initialized true Sealed false Total Recovery Shares 5 Threshold 3 Version 1.15.2 Build Date 2023-11-06T11:33:28Z Storage Type raft Cluster Name vault-cluster-eeed0108 Cluster ID 64d0461e-ac12-ca0c-7bf2-b10ab4629939 HA Enabled true HA Cluster https://vault-server-2.vault-server-internal:8201 HA Mode active Active Since 2024-01-19T17:01:48.116967849Z Raft Committed Index 52557 Raft Applied Index 52557

Vault CLI Version (retrieve with vault version): / $ vault status Key Value

Recovery Seal Type shamir Initialized true Sealed false Total Recovery Shares 5 Threshold 3 Version 1.15.2 Build Date 2023-11-06T11:33:28Z Storage Type raft Cluster Name vault-cluster-eeed0108 Cluster ID 64d0461e-ac12-ca0c-7bf2-b10ab4629939 HA Enabled true HA Cluster https://vault-server-2.vault-server-internal:8201 HA Mode active Active Since 2024-01-19T17:01:48.116967849Z Raft Committed Index 52557 Raft Applied Index 52557

Server Operating System/Architecture: Kubernetes on digital ocean

Vault server configuration file(s): vaules.yaml for vault helm chart: server: extraEnvironmentVars: GOOGLE_REGION: $some_value GOOGLE_PROJECT: $some_value GOOGLE_APPLICATION_CREDENTIALS: $some_value

extraVolumes:

type: ‘secret’ name: ‘$some_value’ dataStorage: enabled: true

Size of the PVC created size: $some_value ha: enabled: true replicas: 3 raft: enabled: true config: | ui = true

storage "raft" { path = "/vault/data" }

listener "tcp" { address = "0.0.0.0:8200" cluster_addr = "0.0.0.0:8201" tls_disable = "true" } seal "gcpckms" { credentials = "$some_value" project = "$some_value" region = "$some_value" key_ring = "$some_value" crypto_key = "$some_value" }

Jan 20 '24 10:01 mohsen-abbas

seeing same issue on EKS 1.29, vault 1.15.6, postgres backend. vault status looks fine and HA appears to work (e.g. killing active node causes standby to be promoted). based on other threads suspect it relates to health checks or liveness probes, but have taken those from official helm charts so not sure how to adjust to eliminate this confusing logging.

Mar 19 '24 23:03 deadlysyn

same issue after upgrade Kubernetes version from v1.27.x to v1.28.2

2024-04-01T02:32:21.599Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2024-04-01T02:32:33.396Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:32:33.396Z [ERROR] core: forward request error: error="error during forwarding RPC request"

Apr 01 '24 02:04 adiii717

Sadly same version worked in staging but it broke the prod, everything is same in prod not sure whats going here, even after upgrading helm chart (vault 1.25.2 and 1.16) didn’t help vault healthcheck also green :(

2024-04-01T02:36:59.653Z [INFO]  events: Starting event system
2024-04-01T02:36:59.654Z [INFO]  core: stored unseal keys supported, attempting fetch
2024-04-01T02:36:59.668Z [INFO]  core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2024-04-01T02:36:59.669Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2024-04-01T02:36:59.669Z [INFO]  core: vault is unsealed
2024-04-01T02:36:59.669Z [INFO]  core: entering standby mode
2024-04-01T02:36:59.767Z [INFO]  core: unsealed with stored key
2024-04-01T02:37:40.306Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:37:40.306Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:38:39.039Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:38:39.039Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:39:27.125Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:39:27.126Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:39:50.468Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:39:50.468Z [ERROR] core: forward request error: error="error during forwarding RPC request"

vault status look ok

vault status
Key                      Value
---                      -----
Recovery Seal Type       shamir
Initialized              true
Sealed                   false
Total Recovery Shares    5
Threshold                3
Version                  1.15.2
Build Date               2023-11-06T11:33:28Z
Storage Type             mysql
Cluster Name             vault-cluster-639b256d
Cluster ID               0a28b613-6e6b-48a8-f3eb-4c7fb59882d9
HA Enabled               true
HA Cluster               https://vault-0.vault-internal:8201
HA Mode                  standby
Active Node Address      http://10.244.1.135:8200

tls also disabled

              Go Version: go1.21.3
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")

and

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

Apr 01 '24 02:04 adiii717

from debug logs

2024-04-01T03:41:06.811Z [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2024-04-01T03:41:06.811Z [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
2024-04-01T03:41:06.812Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T03:41:07.628Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""

Apr 01 '24 03:04 adiii717

Same issue here, I see this is bad certificate error between nodes clustering:

{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2024-04-21T17:56:20.646062Z","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: tls: failed to verify certificate: x509: certificate is valid for fw-c662f8a7-1ff3-33e9-c4fb-28a37bcbdf43, not fw-fd671ed2-3dbe-2f7a-7433-bb1ac5d3d632\""}

Apr 21 '24 18:04 ebuildy

Also seeing this issue as well running a 3 node cluster on EKS 1.26 (with a DynamoDB backend) & GKE 1.27 (running a GCS backend) both running Vault 1.16.2.

https://github.com/hashicorp/vault/issues/10395 this closed issue also seems to document the behaviour and suggests the root cause is around being unable to correctly select the leader/active node

We have found that increasing the replicas to 5 sometimes helps elect a leader correctly or alternatively scaling down to a single node and then adding each node back in one at a time to reach the desired 3 replicas. Otherwise the pods will continue to fail the liveness probes and end up in a crash loop

Jun 13 '24 16:06 AGCunningham-sky

Hello, We are having similar problems with our Vault deployment installed with helm chart. Currently, we have two environments (staging and production) with the same vault configuration. Staging environment is running on k8s 1.30 and vault is installed using chart 0.28.1. This configuration seems to work even though occasionally we are getting: core: error during forwarded RPC request: error="rpc error: code = Canceled desc = context canceled" core: forward request error: error="error during forwarding RPC request"

Production is the problematic one (k8s 1.29) - we had to downgrade to chart version 0.24.1 (from 0.28.1) because our pods started restarting causing vault to get sealed. (Restarts were caused by failed helm's default liveness probes) Error/warning logs started occurring few minutes after pods started, being the same as above, with addition to the following logs: core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]

Vault configuration for both environments:

server:
  annotations:
    ad.datadoghq.com/vault.logs: XXXXXXXXXX
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: XXXXXXXXX
  volumes:
    - name: node-cert
      secret:
        secretName: vault-node-cert
  volumeMounts:
    - mountPath: /etc/certs
      name: node-cert
      readOnly: true
  ha:
    enabled: true
    replicas: 2
    apiAddr: https://vault.{{ .Values.dns_subdomain }}:443    
    disruptionBudget:
      maxUnavailable: 1
    config: |
      ui = true
      listener "tcp" {
        tls_cert_file      = "/etc/certs/tls.crt"
        tls_key_file       = "/etc/certs/tls.key"
        tls_client_ca_file = "/etc/certs/ca.crt"
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      seal "awskms" {
        region     = "{{ .Values.aws.region }}"
        kms_key_id = "{{ .Values.aws.kms_key_id }}"
      }
      storage "dynamodb" {
        ha_enabled = "true"
        region     = "{{ .Values.aws.region }}"
        table      = "{{ .Values.env_name }}-apps-vault-data"
      }
      service_registration "kubernetes" {}
  service:
    enabled: true
    port: 8200
    targetPort: 8200

In both cases we are advertising API address through Load Balancer.

Jul 23 '24 09:07 Stikic991

vault vault copied to clipboard

[WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=

vault
vault copied to clipboard