aws-iam-authenticator daemonset unable to start due to host fs permissions
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.23.0
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:52:18Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/arm64"}
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Follow this guide for an existing cluster:
https://kops.sigs.k8s.io/authentication/#aws-iam-authenticator
6. What did you expect to happen?
aws-iam-authenticator deamonset starts up successfully
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
....
authentication:
aws:
image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64
authorization:
rbac: {}
...
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
time="2022-05-06T20:04:02Z" level=info msg="starting mapper \"MountedFile\""
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM role" groups="[system:masters]" role="arn:aws:iam::xxx:role/ZZZZ" username=kubernetes-admin
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM user" groups="[system:masters]" user="arn:aws:iam::xxx:user/greg.tinoco" username=YYY
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM user" groups="[system:masters]" user="arn:aws:iam::xxx:user/gkogan" username=UUU
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM Account" accountID=xxx
time="2022-05-06T20:04:02Z" level=fatal msg="could not load/generate a certificate" error="open /var/aws-iam-authenticator/cert.pem: permission denied"
9. Anything else do we need to know?
The code seems to be directly affected by the 600 permissions - it appears the 666 is what is needed to make aws-iam-authenticator happy
https://github.com/kubernetes/kops/blob/acacf62cdff6dfe68b724eef45b1e8e249e47824/nodeup/pkg/model/kube_apiserver.go#L324-L341
As soon as I manually set the 777 on the directory and 666 on the files inside `/var/aws-iam-authenticator/, the daemonset started working fine. There is really no reason why aws-iam-authenticator needs write access there that I can think of, but looks like it is required.
here's the daemonset def:
- name: aws-iam-authenticator
image: >-
602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64
args:
- server
- '--config=/etc/aws-iam-authenticator/config.yaml'
- '--state-dir=/var/aws-iam-authenticator'
- '--kubeconfig-pregenerated=true'
FYI, as a workaround, you can add the following to the master IG
additionalUserData:
- content: |-
#!/bin/sh
chmod 666 /srv/kubernetes/aws-iam-authenticator/*.pem
name: z_aws-iam-authenticator-fix-permission.sh
type: text/x-shellscript
I couldn't reproduce this. Wondering if you are using a custom AMI or something else that may affect this.
No, it was the default ubuntu ami, but for arm64 which should not make a difference in theory. The only thing that might be different is that I was enabling the aws-iam-authenticator after the cluster was already provisioned and the fact that I needed to use the arm64 container 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64. I had this happen in two clusters already. But it is pretty reproducible from what I saw.
I believe this commit should fix the issue: https://github.com/kubernetes/kops/commit/74310774f1805b9d48a2f308948ce3b71a25b56d
My daemonset did not have it, but it was created with 1.23 and this commit is for the 1.24.x, so maybe it will work going forward :)
Ah. This may be why I didn't spot the problem either :)
kops 1.24 is not too far away. I expect the 1.24 beta 2 release will be pretty much the same as GA. Are you able to test this release?
I will be provisioning a new cluster in a day or so, so I will try to report back, but I do think it should work. BTW, I noticed another issue related to using arm64 on the masters, which may or may not be an issue with 1.24 where the image for aws-iam-authenticator is defaulting to the x86_64/amd architecture even though the master is arm64. It is easily correctable with specifying the right image, which lead me to this issue :)
I suggest filing another issue on the arm64 bug. Most likely something upstream needs to fix.
Kops 1.24 still the same bug: W0713 14:43:51.365708 9808 builder.go:215] failed to digest image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.5" tried changing the image selection 0.5.7/8/9 -amd64 / debian-jessy /scratch
Which one? The main issue I reported is not related to the actual aws-iam-authenticator image so changing it will not make a difference. Also, I do not believe I saw this specific message in the logs. If you are referring to the arm64 vs amd64 issue then I did not open this one yet, so I doubt anybody fixed it ...
try adding this to the aws-iam-authenticator pod definition:
containers:
- name: aws-iam-authenticator
...
securityContext:
capabilities:
drop:
- ALL
runAsUser: 10000
runAsGroup: 10000
allowPrivilegeEscalation: false
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
FYI, as a workaround, you can add the following to the master IG
additionalUserData: - content: | #!/bin/sh chmod 666 /srv/kubernetes/aws-iam-authenticator/*.pem name: z_aws-iam-authenticator-fix-permission.sh type: text/x-shellscript
I believe this commit should fix the issue: 7431077
I installed a kops 1.26.2 version which has 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.12 image and I still had to add your additionalUserData snippet
@ gregkoganvmm You saved my day :+1: