vault icon indicating copy to clipboard operation
vault copied to clipboard

Kubernetes 1.21 - Serviceregistration is not reloading the service account token

Open Oro opened this issue 3 years ago • 6 comments

Describe the bug With https://github.com/hashicorp/vault/pull/14144 the auth method for kubernetes was updated to be able to use the short-lived serviceaccount token. However, the service registration is still expecting a long-lived token as it is only loaded during initialization - https://github.com/hashicorp/vault/blob/04195248444cad1dfb862f2946dc3250da3ccd5b/serviceregistration/kubernetes/client/client.go#L46 with https://github.com/hashicorp/vault/blob/04195248444cad1dfb862f2946dc3250da3ccd5b/serviceregistration/kubernetes/client/config.go#L54
This will result in service registration no longer working in a future version of k8s or if k8s is configured with --service-account-extend-token-expiration=false.

To Reproduce Steps to reproduce the behavior:

  1. Kubernetes with audit log to see if something is using a stale token, for minikube e.g. https://minikube.sigs.k8s.io/docs/tutorials/audit-policy/ . For example, I've used the following to start the cluster.
mkdir -p ~/.minikube/files/etc/ssl/certs

cat <<EOF > ~/.minikube/files/etc/ssl/certs/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
EOF

minikube start --cpus 2 --memory 16Gi --kubernetes-version v1.23.3  --driver=podman  --cni calico  --extra-config=apiserver.audit-policy-file=/etc/ssl/certs/audit-policy.yaml --extra-config=apiserver.audit-log-path=-
  1. helm upgrade --install vault hashicorp/vault --set server.image.tag=1.9.3 --set server.ha.enabled=true --set server.ha.replicas=1 --set server.ha.raft.enabled=true
  2. After two hours (I've disabled ntp and advanced the local time, e.g. sudo date --set="$(date --date="2 hours")"), do something that should trigger a label change for the vault pod, e.g. kubectl exec -ti vault-0 -- vault operator init - The audit logs will show the stale token kubectl logs -f kube-apiserver-minikube -n kube-system | grep audit.k8s.io/v1 | grep stale | grep vault-0
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"e708572a-7a8d-4783-8e77-df02dbdcb932","stage":"RequestReceived","requestURI":"/api/v1/namespaces/default/pods/vault-0","verb":"patch","user":{"username":"system:serviceaccount:default:vault","uid":"650b4c68-f4a5-4fa2-94c9-66e16e069ca7","groups":["system:serviceaccounts","system:serviceaccounts:default","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["vault-0"],"authentication.kubernetes.io/pod-uid":["484d8d51-c6b6-4a1a-8d7c-8d94430aa90b"]}},"sourceIPs":["10.244.120.66"],"userAgent":"Go-http-client/1.1","objectRef":{"resource":"pods","namespace":"default","name":"vault-0","apiVersion":"v1"},"requestReceivedTimestamp":"2022-02-23T17:10:43.610665Z","stageTimestamp":"2022-02-23T17:10:43.610665Z","annotations":{"authentication.k8s.io/stale-token":"subject: system:serviceaccount:default:vault, seconds after warning threshold: 3704"}}
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"e708572a-7a8d-4783-8e77-df02dbdcb932","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/default/pods/vault-0","verb":"patch","user":{"username":"system:serviceaccount:default:vault","uid":"650b4c68-f4a5-4fa2-94c9-66e16e069ca7","groups":["system:serviceaccounts","system:serviceaccounts:default","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["vault-0"],"authentication.kubernetes.io/pod-uid":["484d8d51-c6b6-4a1a-8d7c-8d94430aa90b"]}},"sourceIPs":["10.244.120.66"],"userAgent":"Go-http-client/1.1","objectRef":{"resource":"pods","namespace":"default","name":"vault-0","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2022-02-23T17:10:43.610665Z","stageTimestamp":"2022-02-23T17:10:43.628776Z","annotations":{"authentication.k8s.io/stale-token":"subject: system:serviceaccount:default:vault, seconds after warning threshold: 3704","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by RoleBinding \"vault-discovery-rolebinding/default\" of Role \"vault-discovery-role\" to ServiceAccount \"vault/default\"","mutation.webhook.admission.k8s.io/round_0_index_0":"{\"configuration\":\"vault-agent-injector-cfg\",\"webhook\":\"vault.hashicorp.com\",\"mutated\":false}","pod-security.kubernetes.io/enforce-policy":"privileged:latest"}}

Expected behavior The serviceaccount token should be periodically refreshed.

Environment:

  • Vault Server Version 1.9.3
  • Vault CLI Version 1.9.3
  • Server Operating System/Architecture: minikube k8s 1.23.3 with podman/x64

Vault server configuration file(s):

    disable_mlock = true
    ui = true

    listener "tcp" {
      tls_disable = 1
      address = "[::]:8200"
      cluster_address = "[::]:8201"
    }

    storage "raft" {
      path = "/vault/data"
    }

    service_registration "kubernetes" {}

Additional context Just to be clear: auth/kubernetes works fine with 1.9.3, this is solely about the service registration not being updated to also reflect short-lived k8s tokens.

Also note: The pod will still get the correct labels set as 1.21 does not yet force short-lived tokens by default.

Oro avatar Feb 23 '22 15:02 Oro

Looks to me like the token needs to be loaded either every single time the client does a call, or has a cache. Did you already have/wanted to make a PR for this yourself? @Oro

RemcoBuddelmeijer avatar Feb 23 '22 16:02 RemcoBuddelmeijer

client-go would transparently do the reloading out of the box: https://github.com/kubernetes/client-go/blob/v0.22.0/rest/config.go#L75-L78

	// Path to a file containing a BearerToken.
	// If set, the contents are periodically read.
	// The last successfully read value takes precedence over BearerToken.
	BearerTokenFile string

It seems to do this every minute: https://github.com/kubernetes/client-go/blob/v0.22.0/transport/token_source.go#L74-L78

However, vault seems to have it's custom client implementation, which does not implement the reloading: https://github.com/hashicorp/vault/blob/d341378c6353a9e0f776ca024215ea11be8672df/serviceregistration/kubernetes/client/client.go#L32-L36

lwimmer avatar Feb 23 '22 17:02 lwimmer

@lwimmer If the intention is there to keep the implementation as it is now, the caching file loader from the Kubernetes plugin could be moved to the SDK. Then the behaviour in both the auth method and the rest of Vault would be the same.

On top of this luckily enough the caching loader from Vault's Kubernetes auth method plugin is equal to the go-client, with the upside of not having the dependencies.

https://github.com/hashicorp/vault-plugin-auth-kubernetes/blob/20f85c165dd8181fc164ae622b5d0dd2d5cfd565/backend.go#L32-L38

Perhaps for now doing just this, or something similar, would not create a huge scope for just this change? The issue is then also fixed and later on debate could start to introduce the official client once the criteria is met (multiple smaller modules or necessity for it).

RemcoBuddelmeijer avatar Feb 23 '22 17:02 RemcoBuddelmeijer

I might be able to have a stab at a PR, however I'd first want to clarify how this should go:

The easy way out (duplicating some of the changes from https://github.com/hashicorp/vault-plugin-auth-kubernetes/pull/122 ) does not seem ideal, however for such a minor functionality as service registration it is probably good enough?

As for the added labels: This issue is NOT about auth/k8s, it's about service registration//service-discovery - can't comment on the ecosystem label though, dunno if that one fits.

Oro avatar Feb 24 '22 07:02 Oro

The easy way out (duplicating some of the changes from hashicorp/vault-plugin-auth-kubernetes#122 ) does not seem ideal, however for such a minor functionality as service registration it is probably good enough?

How and why would this change not be ideal? I agree that an implementation that doesn't require much or any maintenance would be better. But the current implementation is resolved around replicating the go-client, without the extra dependencies, so this change adopts perfectly with the current set-up.

As for the added labels: This issue is NOT about auth/k8s, it's about service registration//service-discovery - can't comment on the ecosystem label though, dunno if that one fits.

@hsimon-hashicorp thoughts on this? I agree that this is not related to k8s auth.

RemcoBuddelmeijer avatar Feb 24 '22 12:02 RemcoBuddelmeijer

How and why would this change not be ideal?

You're right, I was under the false impression that there is going to be a lot of duplication between vault-plugin-auth-kubernetes and vault-core. Having the cachiing file loader in the SDK (and switching the plugin to it later on) seems like a good idea. The point about the full k8s go-client is also understandable, maybe someone will do a refactor later on if needed.

I'll try preparing a PR soon-ish, seems to be not too difficult(tm).

EDIT: I'll not be able to create a PR in the forseeable future, if someone else wants to take this up please feel free

Oro avatar Feb 25 '22 11:02 Oro

Hi folks, so it turns out the k8s client used for service registration does actually reload the service account token if it encounters an error when making an API call. In the example from the description, there should be a log line like this in the vault logs:

2023-02-09T20:22:04.699Z [WARN]  service_registration.kubernetes: unable to update state for 
/metadata/labels/vault-initialized due to PATCH https://10.96.0.1:443/api/v1/namespaces/vault/pods/vault-0 
giving up after 2 attempt(s): bad status code: req method: PATCH, req url: 
https://10.96.0.1:443/api/v1/namespaces/vault/pods/vault-0, resp statuscode: 401, will retry

which is coming from the getCheckRetry() function here which reloads the token and cert by calling inClusterConfig() when the k8s API responds with 401 or 403, and then the update call is retried.

I tried this out locally a bit by setting up a service account for vault whose token would be refreshed every 10 minutes, by using values overrides like this with the chart:

server:
  ha:
    enabled: true
    replicas: 1
    raft:
      enabled: true
  logLevel: debug
  volumes:
  - name: kube-api-access
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 600
          path: token
      - configMap:
          items:
            - key: ca.crt
              path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
              path: namespace
  volumeMounts:
  - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
    name: kube-api-access

Thanks for bringing this up, but I'll go ahead and close the issue for now.

tvoran avatar Feb 09 '23 23:02 tvoran