kubernetes-client icon indicating copy to clipboard operation
kubernetes-client copied to clipboard

Feature Request: Support projected bound service account tokens when run in-cluster

Open aiman-alsari opened this issue 4 years ago • 21 comments

Kubernetes has a beta feature to increase service account token security that is planned to go to General Availability and will become the default at some point.

The general idea is that rather than using the automounted service account token at /var/run/secrets/kubernetes.io/serviceaccount/token, you can project a volume that contains a pod scoped token that is valid for the lifetime of the pod using it. The token can also be auto-rotated based on some TTL. This means accidental token leakage doesn't require a manual rotation of keys etc.

See https://github.com/kubernetes/community/blob/master/contributors/design-proposals/auth/bound-service-account-tokens.md and https://github.com/mikedanese/community/blob/2bf41bd80a9a50b544731c74c7d956c041ec71eb/contributors/design-proposals/storage/svcacct-token-volume-source.md

It would be great if the kubernetes-client could handle the SA token being updated in the filesystem. Currently it stores the value of the token in the io.fabric8.kubernetes.client.Config object's oauthToken property. AFAIK this never gets reloaded. Currently the downstream consumers of the client will need to reload their client objects to refresh the config when a token rotates. Whilst this is doable, it feels like it should be taken care of by the kubernetes-client.

It's also not possible (that I could see) to configure the SA token path, which could be projected to somewhere other than the default location of /var/run/secrets/kubernetes.io/serviceaccount/token.

aiman-alsari avatar Jun 08 '20 15:06 aiman-alsari

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

stale[bot] avatar Sep 20 '20 03:09 stale[bot]

The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default in Kubernetes 1.21 https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#changelog-since-v1200

I think this feature request is more relevant now that BoundServiceAccountTokenVolume is enabled by default in latest Kubernetes.

titisan avatar May 12 '21 12:05 titisan

@titisan : Sorry, looks like I missed this while upgrading kubernetes model to v1.21.0 . Do you know in which package BoundServiceAccountTokenVolume exists? Would it be possible for you to contribute a PR for this ? We'll be happy to provide code pointers

rohanKanojia avatar May 12 '21 12:05 rohanKanojia

I was checking the code and I wonder if the Interceptor for handling expired OIDC tokens (TokenRefreshInterceptor) will also refresh service account token when expires.

https://github.com/fabric8io/kubernetes-client/blob/a2b67bf6f36fc21c42812055a2c4183643b62c09/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/utils/TokenRefreshInterceptor.java#L40

I think the "else" will be executed in case of authenticating with the service account token. The autoconfigure() will reload the token.

According to Kubernetes 1.21 release notes: "Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token." With current implementation of TokenRefreshInterceptor will only refresh (reload) the token when API server rejects with HTTP status code 401.

titisan avatar May 18 '21 09:05 titisan

I think #3105 took care of this.

With current implementation of TokenRefreshInterceptor will only refresh (reload) the token when API server rejects with HTTP status code 401.

IMHO it's cheaper to do this (+backwards compatible), than having a periodic job that reloads the token from the disk files (i.e. once every time is needed instead of once per minute even if it's not needed).

manusa avatar May 18 '21 10:05 manusa

Thanks @manusa, I do agree it is a cheaper solution than reloading the token from disk periodically.

Looking forward to have #3105 in next release.

titisan avatar May 18 '21 13:05 titisan

It is cheaper but incorrect. What happens if I read the docs right is you will not get the 401. Not for a year!

As part of the transition to time limited tokens, the initial token is good for a year, but after a time D (between 1-3) hours, it will be refreshed, the old one remains valid, but the whole point is to generate prometheus metrics to find old updated clients (such as this one). And this client (well its users) will be flagged as "stale"

See https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/1205-bound-service-account-tokens

A 401 is therefore necessary but insufficient, and will make it impossible for consumers using this library to know if is working properly (and in fact in a year it will stop working)

mikebell90 avatar Feb 01 '22 15:02 mikebell90

From the 1.21 readme:

The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.

  • This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
  • Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
  • By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]

mikebell90 avatar Feb 01 '22 15:02 mikebell90

Obviously a hack could be used (a fragile one) wherein everyone that gets a Client instances gets it from a Supplier class, and "pinky-promises" to use that locally only (not per instance). Then the Supplier could reload the Client in a background thread and replace an AtomicReference. That seems hacky and kind of expensive.

mikebell90 avatar Feb 01 '22 16:02 mikebell90

@manusa I don't see anyway for us to plugin alternative interceptors (conveniently at least). And have you see the above? It's an issue IMO

mikebell90 avatar Feb 28 '22 17:02 mikebell90

We could add a configuration option that would mark the token stale after n period.

Then a few options:

  • Scheduled thread that reloads the Token file
  • Check staleness for each operation and reload Token file if needed

manusa avatar Mar 02 '22 15:03 manusa

This also affects informers, which I believe are not as easy to hack as basic usage.

mikebell90 avatar Mar 29 '22 20:03 mikebell90

Hi,

As AWS is rolling out Kubernetes 1.21 nowadays, I was wondering what is the status of the support of this feature?

Has it been released already?

Cheers!

victornoel avatar May 12 '22 09:05 victornoel

This needs more visibility, because as far as I understand AWS EKS disabled the default 1-year extended period and configured 90d period instead.

balonik avatar May 12 '22 15:05 balonik

I am working on implementing this

EDIT: Actually I am not sure how but somehow I was confused and thought this repo was related to the k8s ruby plugin for fluentd: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/337

PettitWesley avatar May 31 '22 16:05 PettitWesley

Change is here: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/337

It is already released in the plugin version 2.11.1

Though I am still working on doing final testing actually... but as far as we can tell it works. I will post an update if it actually doesn't.

PettitWesley avatar Jun 28 '22 01:06 PettitWesley

(It works don't worry)

PettitWesley avatar Jun 28 '22 07:06 PettitWesley

@PettitWesley @manusa Any update on this? Are you changes merged to kubernetes-client.

vgaddavcg avatar Jul 01 '22 16:07 vgaddavcg

@vgaddavcg Changes are merged here: https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/337

And released in plugin version 2.11.1 from what I saw. Anything else beyond that is up to you folks to handle

PettitWesley avatar Jul 06 '22 21:07 PettitWesley

@PettitWesley I'm a bit confused how does an issue in Java based Kubernetes client relate to a PR in Ruby based Fluentd plugin.

scholzj avatar Jul 06 '22 22:07 scholzj

@scholzj @vgaddavcg Sorry yea it seems that I got confused and thought this repo was also related to Fluentd... apologies, I have not fixed anything in this repo.

PettitWesley avatar Jul 14 '22 19:07 PettitWesley

Hello, Any update on this? critical parts of our operations are impacted and the 90 days grace period allowed by EKS to ensure compatibility is running out. Thanks in advance,

AbdelrhmanHamouda avatar Sep 20 '22 10:09 AbdelrhmanHamouda

@AbdelrhmanHamouda I believe it was fixed in https://github.com/fabric8io/kubernetes-client/pull/4264 and shipped with 6.1.0.

victornoel avatar Sep 20 '22 15:09 victornoel

thanks @victornoel, this is a great help.

AbdelrhmanHamouda avatar Sep 20 '22 17:09 AbdelrhmanHamouda

Thanks Victor, I'll close the issue seeing as it has been fixed and merged.

aiman-alsari avatar Sep 21 '22 02:09 aiman-alsari

I think provided solution is more of a brute force approach to refresh service account token every minute. Could we attempt to test for recommended expires_in before preemptively refreshing token or just let it fail and then refresh access token and retry?

apiwoni avatar Sep 22 '22 19:09 apiwoni

@apiwoni I haven't looked at the current implementation but your question does make sense to me. Would you be able to get a PR together for this improvement?

andreaTP avatar Sep 22 '22 20:09 andreaTP

I believe that will break the metrics . Please read the kep. There may be extended tokens for a year but not reareading and reapplying every minute violates the kep

mikebell90 avatar Sep 22 '22 20:09 mikebell90

It is very similar to the original “let’s wait for a 403 approach” already demonstrated to be incorrect

mikebell90 avatar Sep 22 '22 20:09 mikebell90

@mikebell90 Does any of this really applies in cases where client provides access token without using KUBECONFIG but rather by getting it via custom API? I know duration of my access tokens. Microsoft Azure Ad does not provide refresh tokens for Oauth client credentials flow so I need to generate new token when it expires. It does not seem I can use OpenIDConnectionUtils#resolveOIDCTokenFromAuthConfig to generate new access token when refresh is not supported and token expired.

apiwoni avatar Sep 22 '22 22:09 apiwoni