Raft rejoin issue
Describe the bug Using vault helm charts, with raft and ha setup. After unsealing and joining peers to raft, deleting one of the pods makes it unable to rejoin raft cluster and other nodes still try to communicate with old pod.
To Reproduce Steps to reproduce the behavior:
- helm install vault -f values-raft.yaml .
- kubectl exec vault-0 -- /bin/sh vault operator init -recovery-shares=1 -recovery-threshold=1 > /vault/data/recovery-key.txt
- kubectl exec -ti vault-1 -- vault operator raft join http://vault-0.vault-headless:8200 kubectl exec -ti vault-2 -- vault operator raft join http://vault-0.vault-headless:8200
- kubectl delete pods vault-1
- kubectl logs vault-0
Expected behavior New node should be able to rejoin raft cluster and other nodes should stop using old raft node.
Environment:
- Vault Server Version : 1.3.2 / 1.4.0-beta1
- Vault CLI Version : 1.3.2 / 1.4.0-beta1
- Server Operating System/Architecture: kubernetes
Vault server configuration file(s):
server:
# extraEnvironmentVars is a list of extra enviroment variables to set with the stateful set. These could be
# used to include variables required for auto-unseal.
image:
repository: "vault"
tag: "1.4.0-beta1"
# Overrides the default Image Pull Policy
pullPolicy: IfNotPresent
extraEnvironmentVars:
VAULT_TOKEN: <used for transit unseal>
ha:
enabled: true
raft:
enabled: true
service:
enabled: true
headless:
enabled: true
config: |
ui = true
cluster_addr = "http://POD_IP:8201"
api_addr = "http://vault-0.vault-headless:8200"
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
log_level = "Debug"
storage "raft" {
path = "/vault/data"
node_id = "POD_IP:8201"
}
seal "transit" {
address = "https://api.example.org"
disable_renewal = "false"
key_name = "examplekey"
mount_path = "transit/"
tls_skip_verify = "true"
}
injector:
# True if you want to enable vault agent injection.
enabled: false
Using instructions from here: https://github.com/hashicorp/vault-helm/issues/40
logs and k8s info show deleting vault-2 pod and vault-0 pod still using old node_id. Same behaviour is with 1.4.0-beta1.
I managed to run raft remove-peer to remove old peer, but still cant rejoin and dont know how to proceed so need some guidance.
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP
pod/vault-0 1/1 Running 0 7m50s 10.10.117.75
pod/vault-1 1/1 Running 0 7m52s 10.10.112.90
pod/vault-2 1/1 Running 0 7m51s 10.10.109.83
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/vault ClusterIP 10.10.54.144 <none> 8200/TCP,8201/TCP 11m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
service/vault-headless ClusterIP None <none> 8200/TCP,8201/TCP 11m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
NAME READY AGE CONTAINERS IMAGES
statefulset.apps/vault 3/3 11m vault vault:1.3.2
$ kubectl describe service vault-headless
Name: vault-headless
Namespace: default
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=vault
helm.sh/chart=vault-0.4.0
Annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints: true
Selector: app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
Type: ClusterIP
IP: None
Port: http 8200/TCP
TargetPort: 8200/TCP
Endpoints: 10.10.109.83:8200,10.10.112.90:8200,10.10.117.75:8200
Port: internal 8201/TCP
TargetPort: 8201/TCP
Endpoints: 10.10.109.83:8201,10.10.112.90:8201,10.10.117.75:8201
Session Affinity: None
Events: <none>
$ kubectl describe service vault
Name: vault
Namespace: default
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=vault
helm.sh/chart=vault-0.4.0
Annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints: true
Selector: app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
Type: ClusterIP
IP: 10.10.54.144
Port: http 8200/TCP
TargetPort: 8200/TCP
Endpoints: 10.10.109.83:8200,10.10.112.90:8200,10.10.117.75:8200
Port: internal 8201/TCP
TargetPort: 8201/TCP
Endpoints: 10.10.109.83:8201,10.10.112.90:8201,10.10.117.75:8201
Session Affinity: None
Events: <none>
$ kubectl exec vault-2 vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.3.2
Cluster Name vault-cluster-175d901f
Cluster ID f034ecf4-0a4b-f103-fa8c-12b8c8ce2e3b
HA Enabled true
HA Cluster https://10.10.117.75:8201
HA Mode standby
Active Node Address http://10.10.117.75:8200
$ kubectl delete pods vault-2
pod "vault-2" deleted
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP
pod/vault-0 1/1 Running 0 14m 10.10.117.75
pod/vault-1 1/1 Running 0 14m 10.10.112.90
pod/vault-2 0/1 ContainerCreating 0 4s <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.10.0.1 <none> 443/TCP 36d <none>
service/vault ClusterIP 10.10.54.144 <none> 8200/TCP,8201/TCP 17m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
service/vault-headless ClusterIP None <none> 8200/TCP,8201/TCP 17m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
NAME READY AGE CONTAINERS IMAGES
statefulset.apps/vault 2/3 17m vault vault:1.3.2
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP
pod/vault-0 1/1 Running 0 14m 10.10.117.75
pod/vault-1 1/1 Running 0 14m 10.10.112.90
pod/vault-2 1/1 Running 0 35s 10.10.109.84
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.10.0.1 <none> 443/TCP 36d <none>
service/vault ClusterIP 10.10.54.144 <none> 8200/TCP,8201/TCP 17m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
service/vault-headless ClusterIP None <none> 8200/TCP,8201/TCP 17m app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault,component=server
NAME READY AGE CONTAINERS IMAGES
statefulset.apps/vault 3/3 17m vault vault:1.3.2
$ kubectl exec vault-2 vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.3.2
Cluster Name vault-cluster-175d901f
Cluster ID f034ecf4-0a4b-f103-fa8c-12b8c8ce2e3b
HA Enabled true
HA Cluster https://10.10.117.75:8201
HA Mode standby
Active Node Address http://10.10.117.75:8200
2020-03-06T10:51:44.873Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m30.196881968s
2020-03-06T10:51:44.970Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.83:8201 error="dial tcp 10.10.109.83:8201: i/o timeout"
2020-03-06T10:51:46.800Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.83:8201 10.10.109.83:8201}" error="dial tcp 10.10.109.83:8201: i/o timeout"
2020-03-06T10:51:47.348Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m32.671358765s
2020-03-06T10:51:49.835Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m35.158308071s
2020-03-06T10:51:52.312Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m37.635346099s
2020-03-06T10:51:54.763Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m40.086193861s
2020-03-06T10:51:56.096Z [DEBUG] storage.raft.stream: creating rpc dialer: host=raft-65724cee-d53e-da4c-b45b-2347fc59e9ee
2020-03-06T10:51:57.090Z [DEBUG] storage.raft.stream: creating rpc dialer: host=raft-65724cee-d53e-da4c-b45b-2347fc59e9ee
2020-03-06T10:51:57.196Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m42.519957448s
2020-03-06T10:51:59.641Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m44.964353521s
2020-03-06T10:52:02.075Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m47.398880975s
2020-03-06T10:52:04.532Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m49.855738741s
2020-03-06T10:52:06.096Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.83:8201 error="dial tcp 10.10.109.83:8201: i/o timeout"
2020-03-06T10:52:06.974Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m52.297888762s
2020-03-06T10:52:07.091Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.83:8201 10.10.109.83:8201}" error="dial tcp 10.10.109.83:8201: i/o timeout"
2020-03-06T10:52:09.444Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m54.767375921s
2020-03-06T10:52:11.899Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.83:8201 time=2m57.222612534s
2020-03-06T12:12:54.494Z [DEBUG] core.cluster-listener: performing client cert lookup
2020-03-06T12:12:56.593Z [DEBUG] core.cluster-listener: performing server cert lookup
2020-03-06T12:12:56.679Z [DEBUG] core.request-forward: got request forwarding connection
2020-03-06T12:14:31.567Z [INFO] storage.raft: aborting pipeline replication: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}"
2020-03-06T12:14:31.620Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error=EOF
2020-03-06T12:14:31.686Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:31.687Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: connect: connection refused"
2020-03-06T12:14:31.704Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:31.705Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.93:8201 error="dial tcp 10.10.109.93:8201: connect: connection refused"
2020-03-06T12:14:31.763Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:31.764Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: connect: connection refused"
2020-03-06T12:14:31.879Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:31.880Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: connect: connection refused"
2020-03-06T12:14:31.977Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:32.466Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:34.067Z [WARN] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=2.500135648s
2020-03-06T12:14:36.560Z [WARN] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=4.993262183s
2020-03-06T12:14:39.051Z [WARN] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=7.48483423s
2020-03-06T12:14:41.506Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=9.9395677s
2020-03-06T12:14:41.978Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:14:42.134Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:42.466Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.93:8201 error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:14:43.095Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:43.930Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=12.363453907s
2020-03-06T12:14:46.410Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=14.843281983s
2020-03-06T12:14:48.764Z [DEBUG] core.cluster-listener: performing server cert lookup
2020-03-06T12:14:48.883Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=17.316212078s
2020-03-06T12:14:48.897Z [DEBUG] core.request-forward: got request forwarding connection
2020-03-06T12:14:51.326Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=19.759459187s
2020-03-06T12:14:52.134Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:14:52.295Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:53.095Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.93:8201 error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:14:53.757Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=22.190566992s
2020-03-06T12:14:53.999Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:14:56.247Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=24.680257427s
2020-03-06T12:14:58.713Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=27.146831076s
2020-03-06T12:15:01.174Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=29.607497912s
2020-03-06T12:15:02.295Z [ERROR] storage.raft: failed to appendEntries to: peer="{Voter 10.10.109.93:8201 10.10.109.93:8201}" error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:15:02.699Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:15:03.633Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=32.066521407s
2020-03-06T12:15:04.000Z [ERROR] storage.raft: failed to heartbeat to: peer=10.10.109.93:8201 error="dial tcp 10.10.109.93:8201: i/o timeout"
2020-03-06T12:15:04.804Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-3413b47b-735b-f375-1bc6-018a7d0a77c9
2020-03-06T12:15:06.053Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=34.48597863s
2020-03-06T12:15:08.540Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=36.97298619s
2020-03-06T12:15:11.016Z [DEBUG] storage.raft: failed to contact: server-id=10.10.109.93:8201 time=39.448944506s
Hello -
This may be an issue with how the pod is destroyed according to the helm chart, but I'm not 100% versed in helm and kubernetes so maybe I'm wrong 😄
https://github.com/hashicorp/vault-helm/blob/9d92922c9dc1500642278b172a7150c32534de0b/templates/server-statefulset.yaml#L124-L136
It seems we simply kill the process, which is normally fine for Vault, but with Raft I believe we need another step:
$ vault operator raft remove-peer <peer id>
- https://learn.hashicorp.com/vault/operations/raft-storage-aws#remove-a-cluster-member
cc @jasonodonnell
This issue may be more appropriate to be on hashicorp/vault-helm but we can leave it here for now until there's a bit more investigation.
Thanks!
Hi @ngarafol,
The following environment variable needs to be added to the Vault StatefulSet for this to work:
- name: VAULT_CLUSTER_ADDR
value: "https://$(HOSTNAME):8201"
This will change Vault to use dns instead of IP addresses when tracking nodes in the cluster.
Hope that helps!
I had to do more modifications. If I do as @jasonodonnell proposed, hostname is not getting substituted, so vault status reads:
/ $ vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.4.0-beta1
Cluster Name vault-cluster-769d437c
Cluster ID f0c86e29-32aa-9626-745d-11c8fc5c9083
HA Enabled true
HA Cluster https://$(HOSTNAME):8201
HA Mode active
So I had to add:
- name: HOST_NAME$
valueFrom:$
fieldRef:$
fieldPath: metadata.name$
and
- name: VAULT_CLUSTER_ADDR$
value: "https://$(HOST_NAME).vault-headless:8201"$
inside env of server-statefulset yaml template file. Also, note that I used $(HOST_NAME).vault-headless because that is only record that resolves inside pods. Using only hostname wont resolve for some weird reason.
And now after running kubectl delete pod vault-2 I ran into this (log from vault-0):
2020-03-09T11:19:10.225Z [DEBUG] storage.raft: failed to contact: server-id=$(10.10.117.96) time=32.177825809s
2020-03-09T11:19:11.128Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-64efb27c-0356-3e72-890b-d1d68148edc6
2020-03-09T11:19:11.154Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-headless:8201 error="dial tcp: lookup vault-2.vault-headless on 10.10.0.3:53: no such host"
2020-03-09T11:19:12.707Z [DEBUG] storage.raft: failed to contact: server-id=$(10.10.117.96) time=34.659817361s
Somehow it wont resolve vault-2.vault-headless from vault-0 pod but nslookup works inside pod:
$ kubectl exec -it vault-0 nslookup vault-2.vault-headless 10.10.0.3
Server: 10.10.0.3
Address 1: 10.10.0.3 coredns.kube-system.svc.in....
Name: vault-2.vault-headless
Address 1: 10.10.117.97 10-10-117-97.vault.default.svc.in....
Raft config is showing three nodes, but one with wrong ip (the one before pod deletion).
Other than deleting pod, what would be appropriate simulation of pod missing/gone?
EDIT: Seems I got it working, but have to use hostname instead of pod_ip as node_id in the future to avoid confusion :nerd_face:
Hi @ngarafol Can you update the issue with a more clear set of instructions. Also, is raft backend working well for you? I would really like to move towards raft and move away from consul.
Hi @webmutation. This issue is based on instructions from @jasonodonnell listed here https://github.com/hashicorp/vault-helm/issues/40
Basically, you need vault-helm master and pull files from https://github.com/hashicorp/vault-helm/pull/58 and merge locally. I am using transit unseal, but it doesnt matter how you unseal vault.
Since its safer to use hostname than ip address, you can edit settings like I did here: https://github.com/hashicorp/vault/issues/8489#issuecomment-596484299
Regarding raft itself, I have been using it for few days total so cant comment at the moment. We also have consul backed setup, but I am testing raft one.
If all this is too brief still, @jasonodonnell or me can try to write more detailed guide, when time permits.
Thanks. That should be enough to get me going.
The part that was less clear to me was the settings file, I am unsure on what changed regarding the hostname, if it was only what you commented or if you did something more, indeed it would seem that DNS instead of IP would be the only way to resolve... but my main worry is what happens if the pod gets rescheduled and is not terminated with vault operator raft remove-peer <peer id> did you observe any split-brain situations so far ?
Yeah we use consul as well, but it's a bit overkill, I know it is the best supported backed, but it seems like huge overkill. Embedded Raft would be more lightweight and easier to manage.
@catsby
Not 100% sure but thinking aloud, removing raft peer is not necessary here. I wanted peer with same id (and new ip) to return to cluster. If you remove it, you have to manually connect peer to leader, and I dont want to do that. I wanted to test HA resilience with simulating probably bad example - deleting pod. EDIT: Or could I be wrong? Since there are PVCs used, new node will know who is the leader and again try to connect to it? If that is true, what happens in case leader gets changed to different node before new node "boots up"?
Hi @ngarafol,
The following environment variable needs to be added to the Vault StatefulSet for this to work:
- name: VAULT_CLUSTER_ADDR value: "https://$(HOSTNAME):8201"This will change Vault to use dns instead of IP addresses when tracking nodes in the cluster.
Hope that helps!
I have same question with the post.
I do believe that use hostname will help. However, I feel what I really want is to have a way to auto-rejoin when new pod deployed and old pod got deleted.
Right now, the only way is to use rejoin config with fixed entries.
Update at 2022: auto-rejoin can be achived via auto-join with k8s as the provider. refer: https://github.com/hixichen/deploy-open-source-vault-on-gke/blob/main/helm/values-dev.yaml#L116
Auto rejoin works for me, as I said. Deleted node (pod) and new node (pod) automatically rejoined since by raft_id its the same node...
No, it seems that it is not possible to recover a Raft cluster if IP addresses are used and they change.
I have deployed the Helm chart hashicorp/vault-helm in HA mode with Raft and 3 nodes. By default it injects POD_IP addresses everywhere and the Raft setup looks like:
$ vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
91ba5725-c624-9915-1fbb-3a8ec171e29f 100.96.12.86:8201 leader true
d2b72ece-c095-4289-0ee1-a29d60b84324 100.96.14.119:8201 follower true
f712c3ed-c2a2-9b7d-f83c-effaad8a99af 100.96.8.104:8201 follower true
If I then take down all the Vault nodes by deleting the Helm chart with $ helm delete --purge vault (leaving PVC and PV intact, this means that the storage is not removed). And I deploy the same Helm chart again and my Kubernetes cluster assigns completely different IP addresses to all Vault nodes. I get the following situation that it is impossible to recover from (almost no command works):
$ vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.4.2
Cluster Name vault-cluster-c8fdde71
Cluster ID 8fccaa29-df37-4211-9dfb-17f5d5393a8d
HA Enabled true
HA Cluster https://100.96.12.86:8201
HA Mode standby
Active Node Address https://100.96.12.86:8200
Raft Committed Index 2652
Raft Applied Index 2652
$ vault token lookup
Error looking up token: context deadline exceeded
$ vault operator raft list-peers
Error reading the raft cluster configuration: context deadline exceeded
$ vault operator raft join https://vault-api-addr:8200
Error joining the node to the raft cluster: Error making API request.
URL: POST https://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:
* raft storage is already initialized
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.379002Z","node":{},"term":544}
{"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2020-06-17T13:28:56.380535Z","alpn":"raft_storage_v1","host":"raft-a906b8db-1279-2d66-4075-be3f5f55b544"}
{"@level":"debug","@message":"votes","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.382887Z","needed":2}
{"@level":"debug","@message":"vote granted","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.382932Z","from":"d2b72ece-c095-4289-0ee1-a29d60b84324","tally":1,"term":544}
{"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2020-06-17T13:28:56.382978Z","alpn":"raft_storage_v1","host":"raft-a906b8db-1279-2d66-4075-be3f5f55b544"}
{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2020-06-17T13:28:58.580081Z","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 100.96.12.86:8201: i/o timeout\""}
{"@level":"error","@message":"failed to make requestVote RPC","@module":"storage.raft","@timestamp":"2020-06-17T13:29:00.270019Z","error":"dial tcp 100.96.12.86:8201: i/o timeout","target":{"Suffrage":0,"ID":"91ba5725-c624-9915-1fbb-3a8ec171e29f","Address":"100.96.12.86:8201"}}
{"@level":"error","@message":"failed to make requestVote RPC","@module":"storage.raft","@timestamp":"2020-06-17T13:29:00.273109Z","error":"dial tcp 100.96.8.104:8201: i/o timeout","target":{"Suffrage":0,"ID":"f712c3ed-c2a2-9b7d-f83c-effaad8a99af","Address":"100.96.8.104:8201"}}
{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2020-06-17T13:29:03.580091Z","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 100.96.12.86:8201: i/o timeout\""}
{"@level":"warn","@message":"Election timeout reached, restarting election","@module":"storage.raft","@timestamp":"2020-06-17T13:29:03.957558Z"}
As you can see it attempt to connect to the previous active node IP address (100.96.12.86), but there is no Vault node on that IP anymore. And with vault operator raft join it is not possible to join a valid Vault cluster, because it is already initialized. The only solution is to use DNS everywhere as @jasonodonnell suggested or you could risk loosing access to Vault after a disaster.
If the node was removed via remove-peer, you'd have to clear out its raft data first (i.e. the directory specified in the config's storage.path) in order to rejoin it back to the cluster. I'd be good to take a backup of that dir or move it elsewhere before you do so just in case.
@jasonodonnell cc : @hixichen
I am running integrated storage with RAFT. Version 1.9.3
I have the DNS setup to use headless.
name: VAULT_CLUSTER_ADDR
value: https://$(HOSTNAME).musw2-0-vault-internal:8201
This is my rety_join
path = "/vault/data"
retry_join {
leader_api_addr = "https://musw2-0-vault-0.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-1.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-2.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-3.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-4.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
}
I can confirm the DNS entries are correct
and
The other nodes are getting connection refused and can not join the cluster and the heartbeat is failing.
Logs: │ 2022-06-24T19:21:45.815Z [ERROR] storage.raft: failed to heartbeat to: peer=musw2-0-vault-1.musw2-0-vault-internal:8201 error="dial tcp 172.20.9.24:8201: connect: connection refused" │ │ 2022-06-24T19:21:46.050Z [INFO] http: TLS handshake error from 10.241.247.199:4602: EOF │ │ 2022-06-24T19:21:46.540Z [INFO] http: TLS handshake error from 10.241.247.198:55618: EOF │ │ 2022-06-24T19:21:46.645Z [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter f8cf0d96-a735-2172-90da-111e83423303 musw2-0-vault-1.musw2-0-vault-internal:8201}" error="dial tcp 172.20.9.24:8201: co │ │ nnect: connection refused" │ │ 2022-06-24T19:21:47.089Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=["h2", "http/1.1"]
I have 2 separate clusters running and have ran vault operator init on the one above.
The other cluster has similar logs from all nodes and non are unsealed. I am using Azure key/vault for auto unseal.
This is critical for our implementation and we are Enterprise customers and I will be reaching out but wanted to post here as well.
Thanks.
@fewknow Thanks for sharing, but I think your issue is more related to cluster being sealed. Original issue I had (OP) was that raft rejoin would not work on already unsealed cluster since IP address was used instead of fqdn.
@ngarafol - yes, my issue was just ports. 8201 to 8200 solved it. Sorry about the noise.
I suspect that the issue the related to the setup / configuration (in Azure?).
Hey @ngarafol do you still require further input here or is it okay to close? - sorry I'm late here and trying to understand what's next.
Has this issue been reproduced in a current version of Vault? Please let me know if this is still applicable. Thanks!
Original issue was due to IP being used instead of fqdn. I believe as long as fqdn is used, this issue is not existing at all. Will close, feel free to reopen.