envconsul icon indicating copy to clipboard operation
envconsul copied to clipboard

envconsul hangs forever when it can't fetch from vault

Open RasmusWL opened this issue 6 years ago • 8 comments

Envconsul version

envconsul v0.7.3 (daa2947)

Configuration

upcase = true

vault {
  address = "https://vault:8200"

  # envconsul will automatically get a new VAULT_TOKEN.
  renew_token = true

  ssl {
    enabled = true
    verify  = false
  }

  retry {
    attempts = 1
  }
}

secret {
  no_prefix = true
  format = "FOO_{{ key }}"
  path = "_vault/non-existing"
}

Command

export VAULT_TOKEN=$(curl -s <URL>)
envconsul -config config.hcl /bin/true

Debug output

https://gist.github.com/RasmusWL/f4e21f069f16f025177d885eaf2c24c2

Expected behavior

Either of these, although I would prefer the first

  1. envconsul will pass control to the program, without setting the FOO_<key> environment variables
  2. envconsul returns with error code

Actual behavior

envconsul hangs forever. The status from ps is Sl+, meaning it is in interruptible sleep (waiting for an event to complete).

Steps to reproduce

  1. add secret with path that does not exist, or where the VAULT_TOKEN does not give permission to this secret.
  2. run envconsul -config config.hcl /bin/true
  3. stare forever at your terminal, because envconsul hangs

RasmusWL avatar Mar 22 '18 10:03 RasmusWL

I wouldn't have otherwise seen this as a huge problem, until my docker container sat for several days in AWS Batch waiting for the process to exit.

Anyone know of some type of workaround I can use to detect this in bash?

euclideansphere avatar Sep 19 '18 20:09 euclideansphere

Surprisingly, the same thing happens if there is no secret or prefix entries in the configuration file (although I guess this should be filed as a separate bug report)

RasmusWL avatar Sep 20 '18 10:09 RasmusWL

"Envconsul is highly fault tolerant, meaning it does not exit in the face of failure." - envconsul configuration-file

It seems to be working as intended. 🤨
So maybe this should be a new feature/enhancement request.
For batch style jobs, envconsul needs to exit if the retry limit is reached. (seems like the sane default)

envconsul v0.7.3

config.hcl

vault {
  retry {
    enabled = true
    attempts = 5
    backoff = "2s"
    max_backoff = "10s"
  }
}
* permission denied (retry attempt 1 after "2s")
* permission denied (retry attempt 2 after "4s")
* permission denied (retry attempt 3 after "8s")
* permission denied (retry attempt 4 after "10s")
* permission denied (retry attempt 5 after "10s")
* permission denied (exceeded maximum retries)
2018/10/03 20:24:34.443921 [WARN] vault.token: renewer returned (maybe the lease expired)
2018/10/03 20:24:34.443974 [ERR] (view) lease expired or is not renewable (exceeded maximum retries)
2018/10/03 20:24:34.444010 [ERR] (runner) watcher reported error: lease expired or is not renewable
...hangs forever...

jmcmaster05 avatar Oct 03 '18 20:10 jmcmaster05

envconsul --help

-once Do not run the process as a daemon

So the solution would be something like:

envconsul -once -config config.hcl /bin/true

jmcmaster05 avatar Oct 03 '18 21:10 jmcmaster05

+1 on just passing control the the program without setting the environment variable. If a Vault secret is required by the application, then the application will not start up successfully. Let us decide how to handle that scenario. In our use case specifying a non-existent path is a valid state. Take the following for example:

secret/app/default/my-app {key=value}
secret/app/dev/my-app (no values at the moment)

I would like the app to startup with the value from the default path and simply ignore the fact that secret/app/dev/my-app does not exist. In the future if we need to override the value in the dev environment, we can simply add the secret and re-start the app without having to modify our .hcl file.

Romack avatar Nov 29 '18 18:11 Romack

@jmcmaster05 the Env section of the README gives an example with -once. What I understand is that if you use -once, you wont get envconsul to poll for configuration changes and automatically restart the process :disappointed: so yes, if you're only running a one-off script -once will do fine, otherwise not :neutral_face:

RasmusWL avatar Jan 09 '19 10:01 RasmusWL

Hi @catsby, is there any chance that you (or someone else at HashiCorp) could have a look at this? Is there some simple fix that we are overlooking?

RasmusWL avatar Jan 09 '19 10:01 RasmusWL

Bumped into this too. I'm not running batch-style jobs, just ordinary pods on Kubernetes. From time to time due to networking issues there is a chance to get a pod running with revoked token indefinitely. And -once is not a solution, unfortunately.

It would be nice to be able to instruct envconsul to die when token can't be renewed or explicitly expired and let Kubernetes (or whatever external scheduler) to handle this.

nvkv avatar Mar 02 '20 05:03 nvkv