consul-template
consul-template copied to clipboard
Leased secrets are not renewed on token change when using vault-agent
Consul Template version
v0.25.1
Configuration
vault-agent
vault {
address = "http://localhost:8200"
}
auto_auth {
method "approle" {
config = {
role_id_file_path = "./roleid"
secret_id_file_path = "./secretid"
remove_secret_id_file_after_reading = false
}
}
sink "file" {
config = {
path = "./token"
}
}
}
cache {
use_auto_auth_token = false
}
listener "tcp" {
address = "127.0.0.1:8100"
tls_disable = true
}
template
{{ with secret "consul/creds/foo"}} {{ . }} {{end}}
Command
consul-template -vault-agent-token-file=./token -log-level=debug -template secret.tpl:secret.out
Expected behavior
consul-template reads the vault token from -vault-agent-token-file
, renders secrets according to the template. When the vault-token changes on disk, consul-template reload it and refreshes the secrets in the template.
Actual behavior
vault token obtained via vault-agent has a max_ttl=60m, secrets obtained from consul/creds/foo
have ttl=50m
.
consul-template starts, grabs the vault-token from the configured location, grabs the leased secret from vault and renders the template correctly.
Every ~17mins+ the dependency watcher triggers a run, the leased secret from consul/creds/foo
is refreshed. Fast forward at T+60m, vault-token reaches its max_ttl and cannot be renewed. vault-agent re-auth and gets a new token, all leased secrets generated by the old token are revoked by vault.
Consul-template will reload the token from file and updates its vault client, but it doesn't refresh the templates. If the last lease reneweal from consul-template happens close to the vault-token max_ttl the templates will stay with revoked secrets for an entire sleep loop.
The Fetch function for secrets (e.g. https://github.com/hashicorp/consul-template/blob/master/dependency/vault_read.go#L64-L68) has no easy way I can see to force a refresh, unless some refactory/replumbing happens.
Steps to reproduce
It's easy to reproduce using the sample configuration above, set the token max_ttl to 3 mins and secret backend role to 2 mins
Hey @danieleva, thanks for reporting this.
I did some brief testing and have verified that it should be picking up on the token file change and read in the new token. I tested this manually by starting CT with a blank template w/ the -vault-agent-token-file
argument. Then watched the logs as I touched or overwrote the contents of that file.
The token is set on the Vault client then using the standard call.
Basically using a simple, quick setup I tested what I thought were the 2 key systems and I don't see a problem with either. I'm currently doing just a quick sweep, so that's it for now.
hi, thanks for looking at this. The function that reloads the token from file is working as expected, and it updates the Vault client correctly. The problem is a missing hook/signal to trigger a force-refresh of all leased secrets. The loop responsible for that is not affected by the change in Vault client, so there is a delta between Vault client being updated with a new token and Vault client being used to request secrets. During that time all the secrets generated with the old token are at risk of being revoked by Vault server.
Thanks for the followup. I think I get it now. To restate, to be sure...
When the agent-token is refreshed in it needs to trigger a general refresh on all Vault values instead of letting them wait for the usual timeout.
That's spot on 👍 :)
Any updates on this?
Hey @drawks,
This will probably be included in 0.27.1. I'm preparing for a 0.27.0 now which was initially supposed to be a quick release to update the docker image but sort of blew up a bit when I tried to squeeze in a shell parsing fix which was not fixable without re-writing a large part of that library and led me to drop it. While I'd like to include it in 0.27.0, I think I've delayed that release to long already.
So I'll start working on it for the next-next release. Sorry I don't have better news.
Hi, did this ever get fixed? I believe this is the root cause of the issues I'm seeing with 0.29.0 so I'm guessing not?
I am using the Vault Kubernetes integration to inject a Vault Agent into my pods, the agent then auths using the pod service account and writes the token to a shared volume. When the token expires and is renewed the agent sidecar writes a new token in. Consul-template runs in the main application container in exec mode so it can refresh credentials and restart the application process. This all works fine.
Except when my K8s token reaches max_ttl and a new token is written, consul template blows up trying to renew database credential leases that have been revoked because the parent token has expired. Or worse, consul-template fetches new database credentials, restarts the application and then seconds later the token expires and the (brand new) DB creds are deleted from underneath the app.
I can mitigate this somewhat by ensuring the Vault k8s auth method has a very high max_ttl, but it isn't a foolproof solution. If a pod somehow manages to stay running for long enough it'll still hit this issue. It's also obviously not ideal to have to set the auth token super high, it would be preferable to actually let the auth token expire in a reasonable timeframe in case it gets leaked etc.
I'm putting together a quick 0.29.1 to address an issue with a new feature but won't be working on this as part of that as it touches some related functionality and I don't want to mess to much with that and end up breaking the thing I'm putting the release out to fix.
It is on my radar for the next round of work. Just didn't want everyone to think I'd forgotten about this when it wasn't in this release. Sorry for the delay.
Any updates? We are experiencing similar issue effecting the stability of a number of components we run with consul-template.
Checking in again. This is an urgent issue for stability when using token rotation. Can we get an updated timeline on the fix?
Ping. Please keep looking into resolving this.