vault
vault copied to clipboard
[WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=
Describe the bug Hello, I trust this message finds you well. Our team is currently encountering the described issue within our three-node Vault High Availability (HA) cluster deployed on Kubernetes. The Vault version in use is v1.15.2. Despite attempts to address the problem by restarting the Vault cluster and attempting to promote a standby node to a leader, the issue persists.
We are reaching out for suggestions and assistance in identifying the root cause of the problem. As this is occurring in our production environment, any insights or recommendations would be greatly appreciated. Thanks
To Reproduce Steps to reproduce the behavior:
- Run
helm install vault-server hashicorp/vault -f values.yml -n vault-server - Run
kubectl -n vault-server exec vault-server-0 -- vault operator init -format=json > cluster-keys.json - Run
kubectl -n vault-server exec -ti vault-server-1 -- vault operator raft join http://vault-server-0.vault-server-internal:8200 - Run
kubectl -n vault-server exec -ti vault-server-2 -- vault operator raft join http://vault-server-0.vault-server-internal:8200 - Create kv secret in vault server and inject them inside kubernetes deployment.
- See logs of vault nodes below ( [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=) ==> Vault server configuration:
Administrative Namespace: Api Address: http://x.x.x.x:8200 Cgo: disabled Cluster Address: https://vault-server-2.vault-server-internal:8201 Environment Variables: GODEBUG, GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_PROJECT, GOOGLE_REGION, HOME, HOSTNAME, HOST_IP, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, PATH, POD_IP, PWD, SHLVL, SKIP_CHOWN, SKIP_SETCAP, VAULT_ADDR, VAULT_API_ADDR, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8S_POD_NAME, VAULT_SECRETS_SERVER_ACTIVE_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_ACTIVE_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_HOST, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_ACTIVE_SERVICE_PORT_HTTPS_INTERNAL, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_ADDR, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_PORT_443_TCP_PROTO, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_HOST, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_PORT, VAULT_SECRETS_SERVER_AGENT_INJECTOR_SVC_SERVICE_PORT_HTTPS, VAULT_SECRETS_SERVER_PORT, VAULT_SECRETS_SERVER_PORT_8200_TCP, VAULT_SECRETS_SERVER_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_PORT_8201_TCP, VAULT_SECRETS_SERVER_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_SERVICE_HOST, VAULT_SECRETS_SERVER_SERVICE_PORT, VAULT_SECRETS_SERVER_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_SERVICE_PORT_HTTPS_INTERNAL, VAULT_SECRETS_SERVER_STANDBY_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_ADDR, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8200_TCP_PROTO, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_ADDR, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_PORT, VAULT_SECRETS_SERVER_STANDBY_PORT_8201_TCP_PROTO, VAULT_SECRETS_SERVER_STANDBY_SERVICE_HOST, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT_HTTP, VAULT_SECRETS_SERVER_STANDBY_SERVICE_PORT_HTTPS_INTERNAL, VERSION Go Version: go1.21.3 Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled") Log Level: Mlock: supported: true, enabled: false Recovery Mode: false Storage: raft (HA available) Version: Vault v1.15.2, built 2023-11-06T11:33:28Z Version Sha: cf1b5cafa047bc8e4a3f93444fcb4011593b92cb
2024-01-19T17:01:31.979Z [WARN] unknown or unsupported field cluster_addr found in configuration at /tmp/storageconfig.hcl:10:1 2024-01-19T17:01:31.979Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy="" 2024-01-19T17:01:31.981Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rw-rw---- 2024-01-19T17:01:32.928Z [INFO] incrementing seal generation: generation=1 2024-01-19T17:01:32.928Z [INFO] core: Initializing version history cache for core 2024-01-19T17:01:32.928Z [INFO] events: Starting event system 2024-01-19T17:01:32.929Z [INFO] core: stored unseal keys supported, attempting fetch ==> Vault server started! Log data will stream in below:
2024-01-19T17:01:33.002Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201 2024-01-19T17:01:33.002Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201 2024-01-19T17:01:33.003Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"db3afed9-90ea-e8d8-855c-c87d11e87cce", NotifyCh:(chan<- bool)(0xc0035d30a0), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc0030201e0), NoSnapshotRestoreOnStart:true, skipStartup:false}" 2024-01-19T17:01:33.006Z [INFO] storage.raft: initial configuration: index=65 servers="[{Suffrage:Voter ID:eecd350a-f2a7-403f-9702-619ba8cebe40 Address:vault-server-0.vault-server-internal:8201} {Suffrage:Voter ID:9b285315-469d-d3c1-be5f-9ba1a41f6760 Address:vault-server-1.vault-server-internal:8201} {Suffrage:Voter ID:db3afed9-90ea-e8d8-855c-c87d11e87cce Address:vault-server-2.vault-server-internal:8201}]" 2024-01-19T17:01:33.006Z [INFO] core: vault is unsealed 2024-01-19T17:01:33.006Z [INFO] core: unsealed with stored key 2024-01-19T17:01:33.006Z [INFO] storage.raft: entering follower state: follower="Node at vault-server-2.vault-server-internal:8201 [Follower]" leader-address= leader-id= 2024-01-19T17:01:33.006Z [INFO] core: entering standby mode 2024-01-19T17:01:48.023Z [WARN] storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id= 2024-01-19T17:01:48.023Z [INFO] storage.raft: entering candidate state: node="Node at vault-server-2.vault-server-internal:8201 [Candidate]" term=32 2024-01-19T17:01:48.079Z [INFO] storage.raft: election won: term=32 tally=2 2024-01-19T17:01:48.079Z [INFO] storage.raft: entering leader state: leader="Node at vault-server-2.vault-server-internal:8201 [Leader]" 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=eecd350a-f2a7-403f-9702-619ba8cebe40 2024-01-19T17:01:48.079Z [INFO] storage.raft: added peer, starting replication: peer=9b285315-469d-d3c1-be5f-9ba1a41f6760 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer="{Voter 9b285315-469d-d3c1-be5f-9ba1a41f6760 vault-server-1.vault-server-internal:8201}" 2024-01-19T17:01:48.081Z [INFO] storage.raft: pipelining replication: peer="{Voter eecd350a-f2a7-403f-9702-619ba8cebe40 vault-server-0.vault-server-internal:8201}" 2024-01-19T17:01:48.095Z [INFO] core: acquired lock, enabling active operation 2024-01-19T17:01:48.116Z [INFO] core: post-unseal setup starting 2024-01-19T17:01:48.138Z [INFO] core: loaded wrapping token key 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin runtime catalog 2024-01-19T17:01:48.138Z [INFO] core: successfully setup plugin catalog: plugin-directory="" 2024-01-19T17:01:48.171Z [INFO] core: successfully mounted: type=system version="v1.15.2+builtin.vault" path=sys/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=identity version="v1.15.2+builtin.vault" path=identity/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=kv version="v0.16.1+builtin" path=secret/ namespace="ID: root. Path: " 2024-01-19T17:01:48.172Z [INFO] core: successfully mounted: type=cubbyhole version="v1.15.2+builtin.vault" path=cubbyhole/ namespace="ID: root. Path: " 2024-01-19T17:01:48.294Z [INFO] core: successfully mounted: type=token version="v1.15.2+builtin.vault" path=token/ namespace="ID: root. Path: " 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=kubernetes version="v0.17.1+builtin" path=kubernetes/ namespace="ID: root. Path: " 2024-01-19T17:01:48.295Z [INFO] core: successfully mounted: type=userpass version="v1.15.2+builtin.vault" path=userpass/ namespace="ID: root. Path: " 2024-01-19T17:01:48.311Z [INFO] rollback: Starting the rollback manager with 256 workers 2024-01-19T17:01:48.311Z [INFO] rollback: starting rollback manager 2024-01-19T17:01:48.312Z [INFO] core: restoring leases 2024-01-19T17:01:48.394Z [INFO] expiration: lease restore complete 2024-01-19T17:01:48.395Z [INFO] identity: entities restored 2024-01-19T17:01:48.418Z [INFO] identity: groups restored 2024-01-19T17:01:48.445Z [INFO] core: starting raft active node 2024-01-19T17:01:48.445Z [INFO] storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}" reconcile_interval=0s 2024-01-19T17:01:48.470Z [INFO] core: usage gauge collection is disabled 2024-01-19T17:01:48.718Z [INFO] core: post-unseal setup complete 2024-01-19T17:04:23.948Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:14:19.065Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:29:23.960Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:29:23.977Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:34:19.057Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[] 2024-01-19T17:54:19.054Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=[]
Expected behavior There should not be TLS related exception in vault cluster logs.
Environment:
- Vault Server Version (retrieve with
vault status): / $ vault status Key Value
Recovery Seal Type shamir Initialized true Sealed false Total Recovery Shares 5 Threshold 3 Version 1.15.2 Build Date 2023-11-06T11:33:28Z Storage Type raft Cluster Name vault-cluster-eeed0108 Cluster ID 64d0461e-ac12-ca0c-7bf2-b10ab4629939 HA Enabled true HA Cluster https://vault-server-2.vault-server-internal:8201 HA Mode active Active Since 2024-01-19T17:01:48.116967849Z Raft Committed Index 52557 Raft Applied Index 52557
- Vault CLI Version (retrieve with
vault version): / $ vault status Key Value
Recovery Seal Type shamir Initialized true Sealed false Total Recovery Shares 5 Threshold 3 Version 1.15.2 Build Date 2023-11-06T11:33:28Z Storage Type raft Cluster Name vault-cluster-eeed0108 Cluster ID 64d0461e-ac12-ca0c-7bf2-b10ab4629939 HA Enabled true HA Cluster https://vault-server-2.vault-server-internal:8201 HA Mode active Active Since 2024-01-19T17:01:48.116967849Z Raft Committed Index 52557 Raft Applied Index 52557
- Server Operating System/Architecture: Kubernetes on digital ocean
Vault server configuration file(s): vaules.yaml for vault helm chart: server: extraEnvironmentVars: GOOGLE_REGION: $some_value GOOGLE_PROJECT: $some_value GOOGLE_APPLICATION_CREDENTIALS: $some_value
extraVolumes:
type: ‘secret’ name: ‘$some_value’ dataStorage: enabled: true
Size of the PVC created size: $some_value ha: enabled: true replicas: 3 raft: enabled: true config: | ui = true
storage "raft" { path = "/vault/data" }
listener "tcp" { address = "0.0.0.0:8200" cluster_addr = "0.0.0.0:8201" tls_disable = "true" } seal "gcpckms" { credentials = "$some_value" project = "$some_value" region = "$some_value" key_ring = "$some_value" crypto_key = "$some_value" }
seeing same issue on EKS 1.29, vault 1.15.6, postgres backend. vault status looks fine and HA appears to work (e.g. killing active node causes standby to be promoted). based on other threads suspect it relates to health checks or liveness probes, but have taken those from official helm charts so not sure how to adjust to eliminate this confusing logging.
same issue after upgrade Kubernetes version from v1.27.x to v1.28.2
2024-04-01T02:32:21.599Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2024-04-01T02:32:33.396Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:32:33.396Z [ERROR] core: forward request error: error="error during forwarding RPC request"
Sadly same version worked in staging but it broke the prod, everything is same in prod not sure whats going here, even after upgrading helm chart (vault 1.25.2 and 1.16) didn’t help vault healthcheck also green :(
2024-04-01T02:36:59.653Z [INFO] events: Starting event system
2024-04-01T02:36:59.654Z [INFO] core: stored unseal keys supported, attempting fetch
2024-04-01T02:36:59.668Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=[::]:8201
2024-04-01T02:36:59.669Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2024-04-01T02:36:59.669Z [INFO] core: vault is unsealed
2024-04-01T02:36:59.669Z [INFO] core: entering standby mode
2024-04-01T02:36:59.767Z [INFO] core: unsealed with stored key
2024-04-01T02:37:40.306Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:37:40.306Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:38:39.039Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:38:39.039Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:39:27.125Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:39:27.126Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-04-01T02:39:50.468Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T02:39:50.468Z [ERROR] core: forward request error: error="error during forwarding RPC request"
vault status look ok
vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.2
Build Date 2023-11-06T11:33:28Z
Storage Type mysql
Cluster Name vault-cluster-639b256d
Cluster ID 0a28b613-6e6b-48a8-f3eb-4c7fb59882d9
HA Enabled true
HA Cluster https://vault-0.vault-internal:8201
HA Mode standby
Active Node Address http://10.244.1.135:8200
tls also disabled
Go Version: go1.21.3
Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
and
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
from debug logs
2024-04-01T03:41:06.811Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
2024-04-01T03:41:06.811Z [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
2024-04-01T03:41:06.812Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
2024-04-01T03:41:07.628Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: remote error: tls: internal error\""
Same issue here, I see this is bad certificate error between nodes clustering:
{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2024-04-21T17:56:20.646062Z","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: tls: failed to verify certificate: x509: certificate is valid for fw-c662f8a7-1ff3-33e9-c4fb-28a37bcbdf43, not fw-fd671ed2-3dbe-2f7a-7433-bb1ac5d3d632\""}
Also seeing this issue as well running a 3 node cluster on EKS 1.26 (with a DynamoDB backend) & GKE 1.27 (running a GCS backend) both running Vault 1.16.2.
https://github.com/hashicorp/vault/issues/10395 this closed issue also seems to document the behaviour and suggests the root cause is around being unable to correctly select the leader/active node
We have found that increasing the replicas to 5 sometimes helps elect a leader correctly or alternatively scaling down to a single node and then adding each node back in one at a time to reach the desired 3 replicas. Otherwise the pods will continue to fail the liveness probes and end up in a crash loop
Hello,
We are having similar problems with our Vault deployment installed with helm chart.
Currently, we have two environments (staging and production) with the same vault configuration.
Staging environment is running on k8s 1.30 and vault is installed using chart 0.28.1. This configuration seems to work even though occasionally we are getting:
core: error during forwarded RPC request: error="rpc error: code = Canceled desc = context canceled"
core: forward request error: error="error during forwarding RPC request"
Production is the problematic one (k8s 1.29) - we had to downgrade to chart version 0.24.1 (from 0.28.1) because our pods started restarting causing vault to get sealed. (Restarts were caused by failed helm's default liveness probes)
Error/warning logs started occurring few minutes after pods started, being the same as above, with addition to the following logs:
core.cluster-listener: no TLS config found for ALPN: ALPN=[req_fw_sb-act_v1]
Vault configuration for both environments:
server:
annotations:
ad.datadoghq.com/vault.logs: XXXXXXXXXX
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: XXXXXXXXX
volumes:
- name: node-cert
secret:
secretName: vault-node-cert
volumeMounts:
- mountPath: /etc/certs
name: node-cert
readOnly: true
ha:
enabled: true
replicas: 2
apiAddr: https://vault.{{ .Values.dns_subdomain }}:443
disruptionBudget:
maxUnavailable: 1
config: |
ui = true
listener "tcp" {
tls_cert_file = "/etc/certs/tls.crt"
tls_key_file = "/etc/certs/tls.key"
tls_client_ca_file = "/etc/certs/ca.crt"
address = "[::]:8200"
cluster_address = "[::]:8201"
}
seal "awskms" {
region = "{{ .Values.aws.region }}"
kms_key_id = "{{ .Values.aws.kms_key_id }}"
}
storage "dynamodb" {
ha_enabled = "true"
region = "{{ .Values.aws.region }}"
table = "{{ .Values.env_name }}-apps-vault-data"
}
service_registration "kubernetes" {}
service:
enabled: true
port: 8200
targetPort: 8200
In both cases we are advertising API address through Load Balancer.