kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

Allow reading vault token from environment variables

Open tz-torchai opened this issue 1 year ago • 13 comments

Hi team, first of all, thanks for all your work!

Problem

This is originated from this discussion.

In short, currently the vault token that kustomize-controller reads is expected to not expire, and is the same that was used to encrypt the data key. This bring two concerns:

  1. Vault token is issued based on identity. The token that's obtained by an engineer after vault login is only attributed to that user. It shouldn't be used inside kustomize-controller to decrypt against vault, representing a machine user. This violates the identity based authorization principle.
  2. The expection that the token does not expire increases security attack surface and brings security concerns.

Proposed Solution

By allowing kustomize-controller to read this sops vault-token value from a environment variable, we will unlock many potential ways to supply that token to kustomize-controller, e.g. There are various vault agent, injector, webhook that can work together to supply the vault token after authentication againt Vault using Kubernetes Auth, for example.

After all, this will provide a more flexible way and allow customization on secret decryption.

tz-torchai avatar Jul 05 '22 19:07 tz-torchai

Hello there.

kustomize-controller expects a secret with a token with sufficient rights to fetch master keys from vault transit engine. kustomize-controller doesn't make any assumption on the token, so you can rotate it as you see fit.

I don't think there is an expectation that the token does not expire. If the sops decryption is not concluant, the reconciliation is stopped, and resume at next reconciliation interval.

Sops accepts vault-token value from an environment variable, but we disabled it in kustomize-controller. The reason being that attaching that env-var to the controller pod breaks multi-tenancy (any workload could be reconciled with that var).

Do you think it is feasible to use the Vault CSI Provider to sync your token to a kubernetes secret? You could then use that secret with your flux kustomization.

souleb avatar Jul 06 '22 08:07 souleb

Thanks for replying.

kustomize-controller doesn't make any assumption on the token, so you can rotate it as you see fit.

Do you think it is feasible to use the Vault CSI Provider to sync your token to a kubernetes secret? You could then use that secret with your flux kustomization.

It's accurate that kustomize-controller doesn't make assumption on the token but from my lots of research and effort, I haven't found a way and not sure it's possible to sync a Vault token to that secret, easily. That's why I said the token is expected to not expire so it won't need to renew itself.

I don't think Vault CSI Provider would work here since that is used to sync Vault secrets not Vault token.

The hard part is, how to detect the token is near expiry and automatically renew it, easily?

What's more, the user is distributing their own token to another user to use even for a short period of time.

Sops accepts vault-token value from an environment variable, but we disabled it in kustomize-controller. The reason being that attaching that env-var to the controller pod breaks multi-tenancy (any workload could be reconciled with that var).

Good point. However storing that in a kubernetes secret which lives in etcd also bring security risks in a multi-tenant environment,: afaik etcd is encrypted using a global key. Once that key is compromised, unauthorized users may gain access to secrets of others.

If flux doesn't enfore reading vault token from a kubernetes secret, a user can figure out how to minimize the attack surface and reduce risks in a multi-tenant environment. After all, users can always choose to provide it in a kubernetes secret. This proposal can work as a second option.

But before I comment further, Could you elaborate a bit more on "any workload could be reconciled with that var"? I am really curious.

tz-torchai avatar Jul 06 '22 11:07 tz-torchai

when fetching the secret, kustomize-controller does it in the target namespace, not cluster wide, so you have an isolation at that level. That's no longer the case with an envar.

The other way to rotate secret is to use a cronjob. Users do it for registry token rotation. the doc: https://fluxcd.io/docs/guides/cron-job-image-auth/#using-cronjob-to-sync-ecr-credentials-as-a-kubernetes-secret

souleb avatar Jul 06 '22 12:07 souleb

You mean, env var supplied to kustomize-controller pod can be read by any workload cluster wide? How that can be done?

tz-torchai avatar Jul 06 '22 12:07 tz-torchai

I see it the other way around. How to make sure that an env var provided to a controller that reconciles workloads cluster wide can be constrained to only specific workloads?

A more viable way to go I think would be https://learn.hashicorp.com/tutorials/vault/approle

souleb avatar Jul 07 '22 10:07 souleb

see #695

souleb avatar Jul 07 '22 11:07 souleb

How to make sure that an env var provided to a controller that reconciles workloads cluster wide can be constrained to only specific workloads?

Sorry, but I couldn't figure - What's the difference between reading from a kubernetes secret vs from a env var for this purpose? How does ks-controller reading a kubernetes secret from a target namespace would be able to constrain only specific workloads to reconcile with ks-controller?

But despite the above, if we take a step back, this proposal won't let Flux lose anything, it just unlock a way for users to figure out a more secure, easier way that well integrates with existing infrastructure (vault and kubernetes) to decrypt secrets with Vault.

Users are free to choose to supply the token from environment variables or kubernetes secrets.

see https://github.com/fluxcd/kustomize-controller/issues/695

I like this idea 😄

But I am not sure we want to go with this complex way. I don't know how long it takes to implement and this solution will need ~~to verify identities with a identity provider~~, extra setup and configuration. However, in my proposal, users can simply annotate the pod and utilize Kubernetes RBAC with Vault Policies for fine-grained access control.

tz-torchai avatar Jul 07 '22 13:07 tz-torchai

How to make sure that an env var provided to a controller that reconciles workloads cluster wide can be constrained to only specific workloads?

Sorry, but I couldn't figure - What's the difference between reading from a kubernetes secret vs from a env var for this purpose? How does ks-controller reading a kubernetes secret from a target namespace would be able to constrain only specific workloads to reconcile with ks-controller?

But despite the above, if we take a step back, this proposal won't let Flux lose anything, it just unlock a way for users to figure out a more secure, easier way that well integrates with existing infrastructure (vault and kubernetes) to decrypt secrets with Vault.

see #695

I like this idea smile

But I am not sure we want to go with this complex way. I don't know how long it takes to implement and this solution will need to verify identities with a identity provider, extra setup and configuration. However, in my proposal, users can simply annotate the pod and utilize Kubernetes RBAC with Vault Policies for fine-grained access control.

AppRole requires no outside tie in, it's a vault auth engine. Think of an AppRole as a service user that you assign policy to, You use the Role ID + Secret ID to authenticate and are given a short-lived token in return with the policies and parameters specified in the AppRole.

in #695 i'm requesting Flux can be configured with an AppRole instead of Token, knowing it'll retrieve a token on demand.

Reading this thread I think this is a good half-way house for me, thanks @souleb

when fetching the secret, kustomize-controller does it in the target namespace, not cluster wide, so you have an isolation at that level. That's no longer the case with an envar.

The other way to rotate secret is to use a cronjob. Users do it for registry token rotation. the doc: https://fluxcd.io/docs/guides/cron-job-image-auth/#using-cronjob-to-sync-ecr-credentials-as-a-kubernetes-secret

nxzqio avatar Jul 07 '22 14:07 nxzqio

AppRole requires no outside tie in, it's a vault auth engine. Think of an AppRole as a service user that you assign policy to, You use the Role ID + Secret ID to authenticate and are given a short-lived token in return with the policies and parameters specified in the AppRole.

For the implementation, I was talking about the time it takes before AppRole Auth can integrate with ks-controller.

in https://github.com/fluxcd/kustomize-controller/issues/695 i'm requesting Flux can be configured with an AppRole instead of Token, knowing it'll retrieve a token on demand.

I think it's a matter of time and effort. By allowing reading from env var, users can figure out the best way to retrieve and supply a token(e.g. make it in memory and only visible to the process which requests it), and do not require much effort from the Flux team.

Another concern that makes me reluctant to use AppRole is, we are introducing new "identities" again... that are managed in different places.

In this exmaple, https://learn.hashicorp.com/tutorials/vault/approle#step-2-create-a-role-with-policy-attached it creates a jenkins role with jenkins policy. jenkins role is the new identity, that is managed in Vault.

As a best practice, I would try to manage all identities in a single identity pool as much as I can.

I would prefer utilizing the existing kubernetes RBAC which integrates well and sync with my IAM infrastructure which manages all identities and policies.


Nevertheless, this proposal is about adding a new way to get tokens, not about preventing other ways. It's a addition. And I like the new AppRole idea.

I would appreciate it if the Flux team could adopt the ‘batteries included but replaceable’ approach, and give users the freedom to choose whichever suits them the best.

tz-torchai avatar Jul 07 '22 17:07 tz-torchai

Let's make sure we understand the proposal the same way.

The proposal is to enable the kustomize-controller to read vault-token from an env var. The env var would be injected in the pod after a successful connection to vault using a service account (kubernetes auth).

In this scenario, when declaring a kustomization we would have a choice:

  • declare a secret reference containing a vault-token
  • declare an env var to retrieve a vault-token
  • do nothing and expect a default env var to be used kustomize-cotroller would retrieve the token accordingly.

if we take the following examples:

  • user A creates kustomization appA in namespace appA with an env var APP_A_VAULT_TOKEN
  • user B creates kustomization appB in namespace appB with an env var APP_A_VAULT_TOKEN


We would expect user B's kustomization reconciliation to fail, but it would actually succeed because we don't have a way to constrain the env var.

Now, this is partly implemented. When we cannot retrieve a secret containing a vault-token we fall back to the default sops implementation which will try to retrieve the token from env var VAULT_TOKEN and then file $HOME/.vault-token. We do not advise using that method, and I personally think we should remove it. In any case I don't think going any further the env var way would be supported.

souleb avatar Jul 08 '22 08:07 souleb

The proposal is to enable the kustomize-controller to read vault-token from an env var. The env var would be injected in the pod after a successful connection to vault using a service account (kubernetes auth).

Correct.

user A creates kustomization appA in namespace appA with an env var APP_A_VAULT_TOKEN
user B creates kustomization appB in namespace appB with an env var APP_A_VAULT_TOKEN

We would expect user B's kustomization reconciliation to fail, but it would actually succeed because we don't have a way to constrain the env var.

Thanks for the example. I understood your concern now.

  1. You have a good point 👍 but there is no hard enforcement in ks-controller, so I can do the following using kubernetes secret approach:
kind: Kustomization
metadata:
  name: app-b
spec:
  decryption:
    provider: sops
    secretRef:
      name: app-a-token

It will also succeed.

but it would actually succeed because we don't have a way to constrain the env var.

Specifically, the reason we lose that "isolation", is because we can only reference the same vault token for different apps.


The core issue of "k8s secret and env var approach" is, when ks-controller representing apps to decrypt secrets, it has no way to verify the identity of the app - it doesn't prevent app-b from decrypting app-a secrets using app-a token.


After understanding your concern, it seems AppRole is a good approach. so it will allow app Kustomization to pass a role id and secret id to ks-controller which will then retrieve the token individually?

tz-torchai avatar Jul 08 '22 14:07 tz-torchai

on point 1. there is a hard enforcement at the namespace level. All workloads in the same namespace can use that secret, but appB in namespace B cannot use secret A in namespace A.

not sure I understand point 2.

After understanding your concern, it seems AppRole is a good approach. so it will allow app Kustomization to pass a role id and secret id to ks-controller which will then retrieve the token individually?

yes! And I imagine a token would have a single usage.

souleb avatar Jul 08 '22 15:07 souleb

https://github.com/banzaicloud/bank-vaults/blob/main/pkg/sdk/vault/client.go#L362

has a webhook that dynamically handles secrets on behalf of pods or by itself.

It uses the kubernetes authentication method in vault. vaults trusts the respective (in our case flux's) cluster CA and the JWK token of the serviceaccount of the flux kustomize-controller pod. There should be a role that links to flux's serviceaccount and flux-system namespace to be assigned with a policy to use the transit engine.

Flux should then be able to authenticate itself against vault and retrieve a token to contact the transit engine.

squaricdot avatar Nov 01 '22 09:11 squaricdot