Standby Status 474 after Update to 1.19.0
Hi,
OS Ubuntu 24.04. AWS EC2 Cluster, 3 Vault servers, two on standby, 5 Consul storage servers.
Updated V 1.18.5 to 1.19.0. The standby server comes up with http status 474 instead of 429.
Purged V 1.19.0, reinstalled 1.18.5 and status back to 429.
I was able to reproduce this on our DR cluster.
Thanks.
Mike
Is anyone going to bother to look at this error?
Can you please provide more information about your setup and the specific errors you're getting? For example, from the bug report template:
Environment:
- Vault Server Version (retrieve with
vault status):- Vault CLI Version (retrieve with
vault version):- Server Operating System/Architecture:
Vault server configuration file(s):
# Paste your Vault config here. # Be sure to scrub any sensitive values
Thanks!
Hi,
As stated above, environment: OS Ubuntu 24.04. AWS EC2 Cluster, 3 Vault servers, two on standby, 5 Consul storage servers.
From Standby Server, current status 429
Before Upgrade:
vault version Vault v1.18.0 (77f26ba561a4b6b1ccd5071b8624cefef7a72e84), built 2024-10-08T09:12:52Z
vault status Key Value
Seal Type shamir Initialized true Sealed false Total Shares 5 Threshold 3 Version 1.18.0 Build Date 2024-10-08T09:12:52Z Storage Type consul Cluster Name vault-cluster-6bd3c00d Cluster ID 8440b133-5d4f-8e77-78ac-502a5e87df30 HA Enabled true HA Cluster https://vault-dr.cloud.triciti.com:8201 HA Mode active Active Since 2025-03-27T17:42:13.759605605Z
After Upgrade:
Standby now status 474.
vault version Vault v1.19.0 (7eeafb6160d60ede73c1d95566b0c8ea54f3cb5a), built 2025-03-04T12:36:40Z
Status is the same, the cluster is up and running.
Error from Standby:
2025-03-27T18:00:15.784303+00:00 ip-10-61-2-11 consul[552]: 2025-03-27T18:00:15.783Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T18:01:17.769923+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:17.769Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201 2025-03-27T18:01:17.770307+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:17.769Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201 2025-03-27T18:01:17.770355+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:17.769Z [INFO] core: vault is unsealed 2025-03-27T18:01:17.770396+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:17.769Z [WARN] service_registration.consul: concurrent initialize state change notify dropped 2025-03-27T18:01:17.770438+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:17.769Z [INFO] core: entering standby mode 2025-03-27T18:01:17.784105+00:00 ip-10-61-2-11 consul[552]: 2025-03-27T18:01:17.783Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T18:01:38.026468+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:38.025Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.61.2.43:8201: i/o timeout"" 2025-03-27T18:01:38.026628+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:01:38.026Z [ERROR] core: forward request error: error="error during forwarding RPC request" 2025-03-27T18:02:12.863686+00:00 ip-10-61-2-11 consul[552]: 2025-03-27T18:02:12.862Z [INFO] agent: Synced service: service=vault:vault-dr.cloud.triciti.com:8200 2025-03-27T18:02:12.873369+00:00 ip-10-61-2-11 consul[552]: 2025-03-27T18:02:12.873Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T18:02:40.140231+00:00 ip-10-61-2-11 vault[568]: 2025-03-27T18:02:40.139Z [INFO] http: TLS handshake error from 127.0.0.1:42484: remote error: tls: bad certificate
If I remove version 1.19.0 and go back to 1.18, Standby status goes back to 429.
Please let me know if you need any additional information.
Thanks for your help.
Mike
I fixed the bad certificate error, but same results.
vault[1349]: 2025-03-27T18:14:32.984Z [INFO] core: entering standby mode 2025-03-27T18:14:32.998921+00:00 ip-10-61-2-11 consul[552]: 2025-03-27T18:14:32.998Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T18:14:53.249455+00:00 ip-10-61-2-11 vault[1349]: 2025-03-27T18:14:53.248Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.61.2.43:8201: i/o timeout"" 2025-03-27T18:14:53.252042+00:00 ip-10-61-2-11 vault[1349]: 2025-03-27T18:14:53.249Z [ERROR] core: forward request error: error="error during forwarding RPC request"
Health checks failed with these codes: [474]
Thanks.
Mike
Thanks! 474 indicates the standby node can't talk to the active node. When the cluster is running, is the Vault port for your active node (10.61.2.43:8201) accessible?
Hi.
No, but all previous version worked. The main URL is an internal load balancer with only port 8200 open. It has never in all the years we've used it, has it forwarded port 8201. This IP 10.61.2.43:8201, is the load balancer. The actual servers themselves have port 8201 open for each other, but not for the load balancer.
What changed? What is different with V1.19.0? My other standby using the exact same security policies, subnets, open ports ..... is working just fine.
Looks like V1.19.0 is now using the load balancer address instead of the actual server addresses like the previous versions.
Thanks.
Mike
Thanks for that information! I'm wondering if this PR changed the functionality: https://github.com/hashicorp/vault/pull/28991 I'll check with our engineering teams.
server_1 is running V1.19.0 server_2 and 0 are running V1.18.0
You can see that standby v1.18.0 is fine running on the same system without error.
Thanks.
Mike
This is interesting, here is the successful log of V1.18.0, yet is shows the same error port 8201.
The consul shows three sync'd checks, while the above only shows one.
2025-03-27T21:31:14.976817+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:14.976Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201 2025-03-27T21:31:14.976924+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:14.976Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201 2025-03-27T21:31:14.976960+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:14.976Z [INFO] core: vault is unsealed 2025-03-27T21:31:14.976994+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:14.976Z [WARN] service_registration.consul: concurrent initialize state change notify dropped 2025-03-27T21:31:14.977027+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:14.976Z [INFO] core: entering standby mode 2025-03-27T21:31:14.991522+00:00 ip-10-61-3-12 consul[554]: 2025-03-27T21:31:14.990Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T21:31:29.661197+00:00 ip-10-61-3-12 consul[554]: 2025-03-27T21:31:29.660Z [INFO] agent: Synced service: service=vault:vault-dr.cloud.triciti.com:8200 2025-03-27T21:31:29.673019+00:00 ip-10-61-3-12 consul[554]: 2025-03-27T21:31:29.672Z [INFO] agent: Synced check: check=vault:vault-dr.cloud.triciti.com:8200:vault-sealed-check 2025-03-27T21:31:35.232203+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:35.231Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.61.3.62:8201: i/o timeout"" 2025-03-27T21:31:35.232553+00:00 ip-10-61-3-12 vault[571]: 2025-03-27T21:31:35.231Z [ERROR] core: forward request error: error="error during forwarding RPC request"
Thanks again.
Mike
Oh, that is interesting! I know the status codes changed in 1.19, so I'm going to follow up and make sure that it's doing what is intended, and that it's thoroughly documented. Really appreciate your patience!
Hi,
When I enter the keys to unseal the standby node, it shows Error: This is a standby Vault node but can't communicate with the active node via request forwarding. Sign in at the active node to use the Vault UI.
Thanks.
Mike
Vault 1.19.3 still returns 474. Is there a plan to fallback on the original behavior? or any workaround?
Hi, Issue is still exists with the new release v1.20.0. I believe I now know why. The status code will start as 429 then changes to 474 after the following error.
vault[575]: 2025-07-09T19:07:46.017Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: tls: failed to verify certificate: x509: certificate is valid for *.cloud.triciti.com, cloud.triciti.com, not fw-8d1bc8b5-81e2-3194-e92c-f54472f05794""
You must have updated your error output because all it said before was bad certificate.
Now it says the cert is valid for the correct domain but not *fw-8d1bc8b5-81e2-3194-e92c-f54472f05794*
What is fw-8d1bc8b5-81e2-3194-e92c-f54472f05794\ ??
It's not the cert identifier or ARN, it's not the name of the internal load balancer.
It seems it's looking at the wrong item/variable to evaluate. The new error is quite explicit.
Your developers must know what they changed. Unless you hired developers from Microsoft.
Thanks.
Mike
Thank you for the additional information! That's super helpful. I'll see what I can find out - thanks again! :)
@heatherezell - We're seeing this issue as well. Health checks fail on the 474 status code and our quorum becomes unavailable.