ol-infrastructure icon indicating copy to clipboard operation
ol-infrastructure copied to clipboard

Pulumi and Vault token inheritence issues - heroku app migrations

Open Ardiea opened this issue 1 year ago • 0 comments

References

Ref: #2074 Ref: https://support.hashicorp.com/hc/en-us/articles/360034820694-Parent-Child-Token-Hierarchy Ref: https://www.pulumi.com/registry/packages/vault/api-docs/provider/ Ref: https://registry.terraform.io/providers/hashicorp/vault/latest/docs Ref: https://www.pulumi.com/registry/packages/vault/api-docs/generic/getsecret/ Ref: https://developer.hashicorp.com/vault/docs/auth/approle Ref: https://developer.hashicorp.com/vault/tutorials/auth-methods/approle

Description

For end-of-life of the salt-stack infrastructure we are migrating our heroku apps configuration management into pulumi management. To get credentials (database / aws /etc) we are invoking the getSecret function included with the pulumi vault library. This works well, however every secret issued from using this method gets a child token of a short lived intermediate token created when pulumi is executed. This means any creds issued this way are valid for 20 minutes.

Full summary / explanation:

  1. Pulumi logs into vault with userpass, getting a 30 day token.
  2. Pulumi quietly issues itself a child token with a 20 minute TTL
  3. Pulumi uses the child token to issue creds from auth/aws-mitx/ol-mitopen-application which have a 30 day TTL. a. These creds are a grand child of the original token, and child of the second token.
    b. These look great and are installed into heroku. They work nicely for 20 minutes when the intermediate token expires and is revoked. c. The effective TTL of any token (read: a secret like an AWS Access key) is the shortest TTL of itself OR of ANY ancestor token.

This is bad and annoying because it means the two credentials installed into heroku config vars (AWS_ACCESS_KEY and DATABASE_URL or w/e) are only valid for 20 minutes.

Kick the can solution is to remove step 2, the intermediate and short-lived token used by pulumi during the up. Then the credentials will last at least 30 days. This isn't great either because it has something that is specifically application-context being governed by a vault token issued to pulumi for the purposes of managing infrastructure. What if we had some reason to revoke all the pulumi user-pass tokens? We would immediately break the application because it is still a child of pulumi.

Better Paths

Application Changes Required Path

Ideally the application would authenticate itself into vault via AppRole and then issue its own credentials via the AWS and DB auth endpoints. AppRole is kind of complicated. It would be a lot easier / already-solved-in-practice if the apps were running in EC2. App would need to track its own leases + tokens and do renewals as needed.

Solve it with Infrastructure Path

Create a dynamic pulumi provider that will interface with vault independently (probably using HVAC) of the pulumi vault provider. This provider will use to login to vault and issue credentials which will be tracked as part of the provider state (token + lease expiration dates).

Some challenges include ensure the stacks are upped frequently enough (and the up needs to be successful...) to keep the leases + tokens current / renew / reissue as needed. Is there a way to dynamically trigger an up based on these tracked leases and token TTLs? Also, what is "secret-zero", that is: how do we let this dynamic provider actually interact with vault? Right now the pulumi-vault provider just uses a userpass stored in sops. That isn't great either. There are better ways. Do we pursue the better ways for our dynamic provider?

Ardiea avatar Jan 23 '24 19:01 Ardiea