vault icon indicating copy to clipboard operation
vault copied to clipboard

No longer possible to use label_selector in auto_join field

Open tedo-benchling opened this issue 9 months ago • 5 comments

Describe the bug After https://github.com/hashicorp/vault/pull/29228 using a k8s label selector fails with parsing error due to the validation being done on the auto_join field.

To Reproduce Steps to reproduce the behavior:

  1. Set the auto_join in retry_join to provider=k8s namespace=vault label_selector=\"app.kubernetes.io/name=vault,component=server\""
  2. Try to start vault
  3. See error: vault error loading configuration from /tmp/storageconfig.hcl: error parsing 'storage': malformed auto_join pair label_selector="app.kubernetes.io/name=vault,, expected key=value

Expected behavior Expectation is that the server starts up correctly and uses the label_selector as specified

Environment:

  • Vault Server Version (retrieve with vault status): 1.19.0
  • Vault CLI Version (retrieve with vault version): N/A
  • Server Operating System/Architecture: Amazon linux

Vault server configuration file(s):

storage "raft" {
  path                         = "/vault/data"
  autopilot_reconcile_interval = "10s" # default is "10s"
  autopilot_update_interval    = "2s"  # default is "2s"
  # Ref: https://developer.hashicorp.com/vault/tutorials/operations/performance-tuning#performance_multiplier
  performance_multiplier = 1

  retry_join {
    auto_join             = "provider=k8s namespace=vault label_selector=\"app.kubernetes.io/name=vault, component=server\""
    auto_join_scheme      = "https"
    leader_tls_servername = "vault"
    # Still need these specified when using a self-signed cert
    leader_ca_cert_file     = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file  = "/vault/userconfig/tls-server/tls.key"
  }
}

Additional context Add any other context about the problem here.

tedo-benchling avatar Mar 07 '25 16:03 tedo-benchling

Thank you so much for the report, @tedo-benchling! We especially appreciated the extra effort you spent on identifying the PR that caused this. Look for this to be fixed in the next release. :)

heatherezell avatar Mar 07 '25 22:03 heatherezell

@heatherezell hi, still encounted such error in 1.19.1 during fresh installation with the same configuration as 1.18.5, which was working fine in 1.18.5 failed to parse addresses from auto-join metadata: discover: label_selector: - equals in key's value, enclosing double-quote needed label_selector="value-with-=-symbol""

retry_join { auto_join = "provider=k8s label_selector=\"app.kubernetes.io/name=vault,component=server\" namespace=\"vault-cluster\"" auto_join_scheme = "http" }

Edit: After adding a space between comma and component in label_selector, the error is gone. Strangely, it is working fine in 1.18.5 and earlier version without the space.

age9990 avatar Apr 14 '25 10:04 age9990

I seem to be having this issues running 1.19.1 also. I also can't find the fix in the release notes for 1.19.1, but it is in the changelog. Was trying to find out if something else may have changed, but I can't find for certain that it was fixed in a subsequent release.

BigMacIT avatar May 20 '25 16:05 BigMacIT

It looks like this is actually a bug in go-discover where the re-encoded representation of some parsed config (which we now use) is not correct.

Given the original configuration of: "provider=k8s namespace=vault label_selector=\"app.kubernetes.io/name=vault,component=server\""

We'll parse and normalize and then re-encode with config.String() to get: provider=k8s label_selector=app.kubernetes.io/name=vault,component=server namespace=vault

when it ought to be: provider=k8s label_selector=\"app.kubernetes.io/name=vault,component=server\" namespace=vault

Until we get it fixed upstream I'd recommend adding a space between values in the label selector.

ryancragun avatar May 20 '25 21:05 ryancragun

Not the answer you're likely seeking, but a workaround (and my preferred method) of joining in a kube cluster doesn't directly talk to the Kube API at all. We rely on DNS records for headless Services.

Given your settings in the example above, the following would be the equivalent while relying on Kubernetes' built-in DNS feature and the Vault Helm chart's headless service, named vault-internal, created by default.

storage "raft" {
  # ... snip ...

  retry_join {
    leader_api_addr = "http://vault-internal:8200"
  }
}

This resolves all IPs (IPv4 or IPv6) of the pods in the service named vault-internal. That's what the auto_join block was looking to do anyway, now with fewer moving parts. (docs)

TheLonelyGhost avatar May 29 '25 03:05 TheLonelyGhost

1.19.5 fixed the issue ✅

isaac88 avatar Jun 06 '25 11:06 isaac88

This is fixed for me as well.. Going to close

tedo-benchling avatar Aug 20 '25 17:08 tedo-benchling