vault
vault copied to clipboard
New node cannot join current running cluster (failed to unseal core: error="stored unseal keys are supported, but none were found")
Describe the bug Hi. I have a cluster of 3 nodes (initialized, unsealed, cluster is working and has a lot of secrets) with raft storage backend with auto-unseal. Auto-unseal type is GCP KMS.
I am trying to add 2 new nodes to the cluster (one by one). Config is delivered using ansible and is the same across the cluster (except for addr and other node to node params).
After node service is started I receive the following error
2023-10-31T15:31:55.144Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:31:55.144Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
To Reproduce Steps to reproduce the behavior:
- Have a working cluster with 3 nodes
- Install vault 1.15.1 to the new node
- Start service vault OR run
vault server -config vault.hcl
- See error in logs
Expected behavior A new node successfully joins the cluster and is unsealed with auto-unseal
Environment:
- Vault Server Version (retrieve with
vault status
):1.15.1
- Vault CLI Version (retrieve with
vault version
):Vault v1.15.1 (b94e275f25ccd9011146d14c00ea9e49fd5032dc), built 2023-10-20T19:16:11Z
- Server Operating System/Architecture:
CentOS 8 Stream x86_64
Vault server configuration file(s):
# ---------------------------------------------------------------------------
# config general
# ---------------------------------------------------------------------------
ui = true
disable_mlock = true
plugin_directory = "/usr/local/lib/vault/plugins"
# ---------------------------------------------------------------------------
# config cluster
# ---------------------------------------------------------------------------
# Configure clustering.
api_addr = "http://hashicorp-vault-02.domain.dev:8200"
# The URL where cluster members can find the leader.
cluster_addr = "http://hashicorp-vault-02.domain.dev:8201"
# ---------------------------------------------------------------------------
# config transit seal
# ---------------------------------------------------------------------------
seal "gcpckms" {
credentials = "<path_to_creds.json>"
project = "vault-stage"
region = "eur6"
key_ring = "vault-stage-autounseal"
crypto_key = "gcp-auto-unseal-test"
}
# ---------------------------------------------------------------------------
# config listeners
# ---------------------------------------------------------------------------
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "hashicorp-vault-02.domain.dev:8201"
max_request_duration = "180s"
proxy_protocol_behavior = "use_always"
http_idle_timeout = "10m"
tls_disable = "true"
}
# ---------------------------------------------------------------------------
# config storage
# ---------------------------------------------------------------------------
storage "raft" {
path = "/opt/vault/data"
node_id = "hashicorp-vault-02"
retry_join {
leader_api_addr = "http://hashicorp-vault-03.domain.dev:8200"
}
retry_join {
leader_api_addr = "http://hashicorp-vault-04.domain.dev:8200"
}
retry_join {
leader_api_addr = "http://hashicorp-vault-02.domain.dev:8200"
}
}
# ---------------------------------------------------------------------------
# config Prometheus metrics
# ---------------------------------------------------------------------------
telemetry {
disable_hostname = true
prometheus_retention_time = "12h"
}
Vault service systemd file:
[Unit]
Description=HashiCorp Vault
Requires=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/vault server -config "/etc/vault.d/vault.hcl" -log-level=trace
ExecReload=/bin/kill --signal HUP $MAINPID
KillSignal=SIGINT
User=vault
Group=vault
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
Logs from service
==> Vault server configuration:
Administrative Namespace:
Api Address: http://hashicorp-vault-05.domain.dev:8200
Cgo: disabled
Cluster Address: https://hashicorp-vault-05.domain.dev:8201
Environment Variables: BASH_FUNC_which%%, DEBUGINFOD_URLS, GODEBUG, HISTSIZE, HOME, HOSTNAME, LANG, LC_ALL, LC_CTYPE, LESSOPEN, LOGNAME, LS_COLORS, MAIL, OLDPWD, PATH, PWD, SHELL, SHLVL, SUDO_COMMAND, SUDO_GID, SUDO_UID, SUDO_USER, S_COLORS, TERM, USER, VAULT_ADDR, VAULT_CLIENT_TIMEOUT, _, which_declare
Go Version: go1.21.3
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "hashicorp-vault-05.domain.dev:8201", max_request_duration: "3m0s", max_request_size: "33554432", tls: "disabled")
Log Level:
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.15.1, built 2023-10-20T19:16:11Z
Version Sha: b94e275f25ccd9011146d14c00ea9e49fd5032dc
==> Vault server started! Log data will stream in below:
2023-10-31T15:24:09.683Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2023-10-31T15:24:10.073Z [INFO] incrementing seal generation: generation=1
2023-10-31T15:24:10.074Z [INFO] core: Initializing version history cache for core
2023-10-31T15:24:10.074Z [INFO] events: Starting event system
2023-10-31T15:24:10.074Z [INFO] core: raft retry join initiated
2023-10-31T15:24:10.074Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:24:10.074Z [INFO] core: security barrier not initialized
2023-10-31T15:24:10.074Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2023-10-31T15:24:10.075Z [INFO] core: security barrier not initialized
2023-10-31T15:24:10.075Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://hashicorp-vault-03.domain.dev:8200
2023-10-31T15:24:10.075Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://hashicorp-vault-02.domain.dev:8200
2023-10-31T15:24:10.075Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://hashicorp-vault-04.domain.dev:8200
2023-10-31T15:24:10.146Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=<node_ip>:8201
2023-10-31T15:24:10.146Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=<node_ip>:8201
2023-10-31T15:24:10.147Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"hashicorp-vault-05\", NotifyCh:(chan<- bool)(0xc0034e3ea0), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc002eba9f0), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2023-10-31T15:24:10.148Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:hashicorp-vault-03 Address:hashicorp-vault-03.domain.dev:8201} {Suffrage:Voter ID:hashicorp-vault-04 Address:hashicorp-vault-04.domain.dev:8201} {Suffrage:Voter ID:hashicorp-vault-02 Address:hashicorp-vault-02.domain.dev:8201} {Suffrage:Nonvoter ID:hashicorp-vault-05 Address:follower-sec-hashicorp-vault-hotel-05.domain.dev:8201}]"
2023-10-31T15:24:10.148Z [INFO] core: successfully joined the raft cluster: leader_addr=http://hashicorp-vault-04.domain.dev:8200
2023-10-31T15:24:10.148Z [INFO] storage.raft: entering follower state: follower="Node at hashicorp-vault-05.domain.dev:8201 [Follower]" leader-address= leader-id=
2023-10-31T15:24:15.075Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:24:15.075Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2023-10-31T15:24:20.075Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:24:20.075Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2023-10-31T15:24:25.076Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:24:25.076Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2023-10-31T15:24:26.628Z [WARN] storage.raft: heartbeat timeout reached, not part of a stable configuration or a non-voter, not triggering a leader election
2023-10-31T15:24:30.077Z [INFO] core: stored unseal keys supported, attempting fetch
2023-10-31T15:24:30.077Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
and so on repeating failed to unseal core: error="stored unseal keys are supported, but none were found
error
Are there something news how to solve this?
@ccapurso Hi! Is there any news on this bug? We are still unable to add nodes to the cluster
If I understand correctly, failed to unseal core: error="stored unseal keys are supported, but none were found"
means that it has joined the cluster successfully, but no data has been replicated from any active cluster node to the newly joined node, so it doesn't have what it needs to unseal itself, which leaves the node in a kind of inconsistent state.
I had this error, and it was simply because port 8201 was not opened in the security group. This is perhaps not the same root cause as OP, but I felt it does at least warrant a comment here.
Was scratching my head over this for a bit, before I understood what the problem was. Because the log messages are kind of confusing in this case, especially on the joiner node.
In this setup I have three server nodes which are in a cluster, and one master node which is not in a cluster. The master node has transit engine enabled and is configured for auto unseal. The server nodes are configured to auto unseal using the transit engine that is setup on the master node.
- vault-master: Not part of cluster. Uses transit engine to provide auto unseal for the cluster servers.
- vault-server-1: Cluster node. Initialized and unsealed and working fine.
- vault-server-2: Cluster node. Not joined.
- vault-server-3: Cluster node. Not joined. Attempting to join now.
vault-server-3 claims to have successfully joined the cluster. It shows no error at all. It would be a lot better if it would produce some kind of error during the cluster join process.
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.839Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://10.60.2.1:8200
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.856Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=127.0.0.1:8201
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.856Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=10.60.2.3:8201
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.856Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.1:8201
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.856Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=10.60.2.3:8201
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.859Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemov>Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.860Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault-server-1 Address:10.60.2.1:8201} {Suffrage:Nonvoter ID:vault-server-3 Address:10.60.2.3:8201}]"
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.860Z [INFO] core: successfully joined the raft cluster: leader_addr=http://10.60.2.1:8200
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.864Z [INFO] storage.raft: entering follower state: follower="Node at 10.60.2.3:8201 [Follower]" leader-address= leader-id=
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.925Z [INFO] core: stored unseal keys supported, attempting fetch
Dec 15 13:20:33 vault-server-3 vault[1441]: 2023-12-15T13:20:33.925Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
Dec 15 13:20:38 vault-server-3 vault[1441]: 2023-12-15T13:20:38.925Z [INFO] core: stored unseal keys supported, attempting fetch
Dec 15 13:20:38 vault-server-3 vault[1441]: 2023-12-15T13:20:38.927Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
vault-server-1 also seems to be happy with the joining.
But at least it says i/o timeout
when trying to replicate data, which is what helped me understand the problem.
Dec 15 13:20:33 vault-server-1 vault[843]: 2023-12-15T13:20:33.850Z [INFO] storage.raft: updating configuration: command=AddNonvoter server-id=vault-server-3 server-addr=10.60.2.3:8201 servers="[{Suffrage:Voter ID:vault-server-1 Address:10.60.2.1:8201} {Suffrage:Nonvoter ID:vault-se>Dec 15 13:20:33 vault-server-1 vault[843]: 2023-12-15T13:20:33.853Z [INFO] storage.raft: added peer, starting replication: peer=vault-server-3
Dec 15 13:20:33 vault-server-1 vault[843]: 2023-12-15T13:20:33.853Z [INFO] system: follower node answered the raft bootstrap challenge: follower_server_id=vault-server-3
Dec 15 13:20:43 vault-server-1 vault[843]: 2023-12-15T13:20:43.854Z [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter vault-server-3 10.60.2.3:8201}" error="dial tcp 10.60.2.3:8201: i/o timeout"
Dec 15 13:20:44 vault-server-1 vault[843]: 2023-12-15T13:20:44.572Z [ERROR] storage.raft: failed to heartbeat to: peer=10.60.2.3:8201 backoff time=10ms error="dial tcp 10.60.2.3:8201: i/o timeout"
Dec 15 13:20:53 vault-server-1 vault[843]: 2023-12-15T13:20:53.866Z [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter vault-server-3 10.60.2.3:8201}" error="dial tcp 10.60.2.3:8201: i/o timeout"
Dec 15 13:20:55 vault-server-1 vault[843]: 2023-12-15T13:20:55.380Z [ERROR] storage.raft: failed to heartbeat to: peer=10.60.2.3:8201 backoff time=10ms error="dial tcp 10.60.2.3:8201: i/o timeout"
Using Vault v1.15.4.
Hi.
@thnee Thank you for your comment. However it is not my case. I have tested it with any-any security group rule in all of the nodes and still getting the same error :(
@ccapurso Hi! Is there any news?
Hi @ccapurso, Is there any update on this ?
Hi @ccapurso, Any update? :( Still getting the same error with auto-unseal with Azure KMS.