kapp
kapp copied to clipboard
deterministic ordering of kubernetes.io/service-account-token secrets and SAs
In Kubernetes 1.24 there have been changes to how REST API access tokens are handled. This affects use of kapp-controller as described in separate issue https://github.com/vmware-tanzu/carvel-kapp-controller/issues/687. With such changes one can expect people to start using secrets of type kubernetes.io/service-account-token more often, but kapp appears to experience intermittent failures when presented with secrets of that type. This is even when using Kubernetes 1.23.
The actual error one gets is like:
kapp: Error: Applying create secret/sa-19 (v1) namespace: kapp-token-test:
Getting resource secret/sa-19 (v1) namespace: kapp-token-test:
API server says: secrets "sa-19" not found (reason: NotFound)
So kapp thinks the secret doesn't exist, even though it logged that it created it. Expectation is that the secret doesn't appear to be visible immediately as the Kubernetes admission controller takes time to inject the credentials and certificate into the token secret, since what a user creates is effectively a place holder with Kubernetes filling in the real details.
What steps did you take:
Initially only saw the problem twice, a week apart, but believe have a recipe for replicating the issue.
Create a resources.yaml file containing:
#@ load("@ytt:data", "data")
---
apiVersion: v1
kind: Namespace
metadata:
name: kapp-token-test
#@ for n in range(data.values.count):
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: #@ "sa-{}".format(n)
namespace: kapp-token-test
---
apiVersion: v1
kind: Secret
metadata:
name: #@ "sa-{}".format(n)
namespace: kapp-token-test
annotations:
kubernetes.io/service-account.name: #@ "sa-{}".format(n)
type: kubernetes.io/service-account-token
#@ end
Then run:
ytt -f resources.yaml --data-value-yaml count=20 | kapp deploy -a kapp-token-test -f - -y --debug | tee kapp.log
Increase the count value high enough that keeps trying many times in a single run with hope that it triggers the issue. Success may depend on the Kubernetes cluster being used. Issue has currently been seen on local Kind cluster.
What happened:
It would error with:
kapp: Error: Applying create secret/sa-19 (v1) namespace: kapp-token-test:
Getting resource secret/sa-19 (v1) namespace: kapp-token-test:
API server says: secrets "sa-19" not found (reason: NotFound)
For a full debug log see kapp.log attachment.
In this case the issue was only triggered on very last iteration, but have seen it occur earlier, just have to be lucky.
The result in the namespace at the point of failure was:
% k get secrets,serviceaccount -o name -n kapp-token-test
secret/default-token-h82jk
secret/sa-0
secret/sa-0-token-d7w9s
secret/sa-1
secret/sa-1-token-b7ph4
secret/sa-10
secret/sa-10-token-sppkb
secret/sa-11
secret/sa-11-token-8cwlq
secret/sa-12
secret/sa-12-token-c6vpv
secret/sa-13
secret/sa-13-token-g7d8z
secret/sa-14-token-47x2n
secret/sa-15
secret/sa-15-token-9zpdd
secret/sa-16
secret/sa-16-token-xgpht
secret/sa-17
secret/sa-17-token-bkmj8
secret/sa-18
secret/sa-18-token-gwqk5
secret/sa-19-token-th9w5
secret/sa-2-token-sf5bc
secret/sa-3
secret/sa-3-token-2l89p
secret/sa-4
secret/sa-4-token-pkwzc
secret/sa-5
secret/sa-5-token-wg2nq
secret/sa-6
secret/sa-6-token-sq6rr
secret/sa-7
secret/sa-7-token-sc6c5
secret/sa-8
secret/sa-8-token-plsf7
secret/sa-9
secret/sa-9-token-7zfns
serviceaccount/default
serviceaccount/sa-0
serviceaccount/sa-1
serviceaccount/sa-10
serviceaccount/sa-11
serviceaccount/sa-12
serviceaccount/sa-13
serviceaccount/sa-14
serviceaccount/sa-15
serviceaccount/sa-16
serviceaccount/sa-17
serviceaccount/sa-18
serviceaccount/sa-19
serviceaccount/sa-2
serviceaccount/sa-3
serviceaccount/sa-4
serviceaccount/sa-5
serviceaccount/sa-6
serviceaccount/sa-7
serviceaccount/sa-8
serviceaccount/sa-9
Note not sure if issue may in part be that kapp is reordering resources to create the secret before the service account. The Kubernetes docs say for token secrets:
When using this Secret type, you need to ensure that the kubernetes.io/service-account.name annotation is set to an existing service account name.
The docs thus say existing and since the secret has to have an annotation added with the uid of the service account, the order may actually be important. Thus if kapp does change order they are created, this would delay perhaps the secret being made visible if admission controller holds it for a bit until service account exists.
What did you expect:
Work reliably. :-)
Anything else you would like to add:
Nope.
Environment:
- kapp version (use
kapp --version):
kapp version 0.46.0
- OS (e.g. from
/etc/os-release):
Darwin xxx.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64
- Kubernetes version (use
kubectl version)
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-03-06T21:32:53Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}
- Kind version
kind v0.13.0 go1.18.2 darwin/amd64
- Kind cluster create args:
--image kindest/node:v1.23.4
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible" 👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.
So using:
#@ load("@ytt:data", "data")
---
apiVersion: v1
kind: Namespace
metadata:
name: kapp-token-test
#@ for n in range(data.values.count):
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: #@ "sa-{}".format(n)
namespace: kapp-token-test
annotations:
kapp.k14s.io/change-group: "example.com/sa-with-separate-token-secret"
---
apiVersion: v1
kind: Secret
metadata:
name: #@ "sa-{}".format(n)
namespace: kapp-token-test
annotations:
kubernetes.io/service-account.name: #@ "sa-{}".format(n)
kapp.k14s.io/change-rule: "upsert after upserting example.com/sa-with-separate-token-secret"
type: kubernetes.io/service-account-token
#@ end
appears to solve the issue.
The question is whether the expectation is that users are going to have to know this and add the annotations themselves, or whether kapp could somehow accomodate this automatically.
A solution to this if possible would be an inbuilt rule to create secrets of type kubernetes.io/service-account-token after creating any service accounts, where as current behaviour seems to be to create all secrets before service accounts.
@GrahamDumpleton , thank your for the full analysis and way to reproduce the issue. This really helps. will take up on priority.
Expectation is that the secret doesn't appear to be visible immediately as the Kubernetes admission controller takes time to inject the credentials and certificate into the token secret
it's not that it's not visible, it's that it's deleted by k8s immediately (with tiny delay for some controller to notice) since k8s sees that there is no associated SA.
you can repro this reliably just by creating this single resource
apiVersion: v1
kind: Secret
metadata:
name: sa-1
annotations:
kubernetes.io/service-account.name: sa-1
type: kubernetes.io/service-account-token
kapp deploy may actually succeed but if you look just a tiny second later secret is gone from k8s. if you rerun kapp deploy it will show you that it's trying to create secret again because it doesnt exist.
k8s will not delete secret with that annotation if SA already exists in the server.
Initially I didn't know what was going on so that was best guess, but as I got to in the end I did surmise it was purely ordering requirement. :-)
The question is whether kapp can embody a rule that if a secret is of type kubernetes.io/service-account-token and includes the kubernetes.io/service-account.name annotation, that it is only created after service accounts, or is the upsert rule going to be the only way of dealing with this?
my 2c: this use case seems pretty rare, i think its worth seeing how common this problem becomes/is before promoting this behavior to a default rule.
We have noticed this in a couple of instances lately (especial when there are multiple SAs and Secrets present in the app). We should consider having some default rules to cover this. cc @renuy