kops icon indicating copy to clipboard operation
kops copied to clipboard

aws-iam-authenticator daemonset unable to start due to host fs permissions

Open gregkoganvmm opened this issue 3 years ago • 10 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information. 1.23.0 2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:52:18Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/arm64"}

3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue? Follow this guide for an existing cluster: https://kops.sigs.k8s.io/authentication/#aws-iam-authenticator 6. What did you expect to happen? aws-iam-authenticator deamonset starts up successfully 7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

....
 authentication:
    aws:
      image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64
  authorization:
    rbac: {}

...

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

time="2022-05-06T20:04:02Z" level=info msg="starting mapper \"MountedFile\""
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM role" groups="[system:masters]" role="arn:aws:iam::xxx:role/ZZZZ" username=kubernetes-admin
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM user" groups="[system:masters]" user="arn:aws:iam::xxx:user/greg.tinoco" username=YYY
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM user" groups="[system:masters]" user="arn:aws:iam::xxx:user/gkogan" username=UUU
time="2022-05-06T20:04:02Z" level=info msg="mapping IAM Account" accountID=xxx
time="2022-05-06T20:04:02Z" level=fatal msg="could not load/generate a certificate" error="open /var/aws-iam-authenticator/cert.pem: permission denied"

9. Anything else do we need to know? The code seems to be directly affected by the 600 permissions - it appears the 666 is what is needed to make aws-iam-authenticator happy https://github.com/kubernetes/kops/blob/acacf62cdff6dfe68b724eef45b1e8e249e47824/nodeup/pkg/model/kube_apiserver.go#L324-L341

As soon as I manually set the 777 on the directory and 666 on the files inside `/var/aws-iam-authenticator/, the daemonset started working fine. There is really no reason why aws-iam-authenticator needs write access there that I can think of, but looks like it is required.

here's the daemonset def:

- name: aws-iam-authenticator
          image: >-
            602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64
          args:
            - server
            - '--config=/etc/aws-iam-authenticator/config.yaml'
            - '--state-dir=/var/aws-iam-authenticator'
            - '--kubeconfig-pregenerated=true'

gregkoganvmm avatar May 06 '22 23:05 gregkoganvmm

FYI, as a workaround, you can add the following to the master IG

additionalUserData:
  - content: |-
      #!/bin/sh
      chmod 666 /srv/kubernetes/aws-iam-authenticator/*.pem
    name: z_aws-iam-authenticator-fix-permission.sh
    type: text/x-shellscript

gregkoganvmm avatar Jun 13 '22 20:06 gregkoganvmm

I couldn't reproduce this. Wondering if you are using a custom AMI or something else that may affect this.

olemarkus avatar Jun 14 '22 05:06 olemarkus

No, it was the default ubuntu ami, but for arm64 which should not make a difference in theory. The only thing that might be different is that I was enabling the aws-iam-authenticator after the cluster was already provisioned and the fact that I needed to use the arm64 container 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.7-arm64. I had this happen in two clusters already. But it is pretty reproducible from what I saw.

gregkoganvmm avatar Jun 14 '22 16:06 gregkoganvmm

I believe this commit should fix the issue: https://github.com/kubernetes/kops/commit/74310774f1805b9d48a2f308948ce3b71a25b56d

My daemonset did not have it, but it was created with 1.23 and this commit is for the 1.24.x, so maybe it will work going forward :)

gregkoganvmm avatar Jun 14 '22 20:06 gregkoganvmm

Ah. This may be why I didn't spot the problem either :)

kops 1.24 is not too far away. I expect the 1.24 beta 2 release will be pretty much the same as GA. Are you able to test this release?

olemarkus avatar Jun 15 '22 05:06 olemarkus

I will be provisioning a new cluster in a day or so, so I will try to report back, but I do think it should work. BTW, I noticed another issue related to using arm64 on the masters, which may or may not be an issue with 1.24 where the image for aws-iam-authenticator is defaulting to the x86_64/amd architecture even though the master is arm64. It is easily correctable with specifying the right image, which lead me to this issue :)

gregkoganvmm avatar Jun 15 '22 19:06 gregkoganvmm

I suggest filing another issue on the arm64 bug. Most likely something upstream needs to fix.

olemarkus avatar Jun 17 '22 17:06 olemarkus

Kops 1.24 still the same bug: W0713 14:43:51.365708 9808 builder.go:215] failed to digest image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.5" tried changing the image selection 0.5.7/8/9 -amd64 / debian-jessy /scratch

rikmos avatar Jul 13 '22 21:07 rikmos

Which one? The main issue I reported is not related to the actual aws-iam-authenticator image so changing it will not make a difference. Also, I do not believe I saw this specific message in the logs. If you are referring to the arm64 vs amd64 issue then I did not open this one yet, so I doubt anybody fixed it ...

try adding this to the aws-iam-authenticator pod definition:

  containers:
   - name: aws-iam-authenticator
  ...
      securityContext:
         capabilities:
           drop:
              - ALL
          runAsUser: 10000
          runAsGroup: 10000
          allowPrivilegeEscalation: false

gregkoganvmm avatar Jul 14 '22 02:07 gregkoganvmm

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 12 '22 03:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 11 '22 03:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 11 '22 04:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 11 '22 04:12 k8s-ci-robot

FYI, as a workaround, you can add the following to the master IG

additionalUserData:
  - content: |
      #!/bin/sh
      chmod 666 /srv/kubernetes/aws-iam-authenticator/*.pem
    name: z_aws-iam-authenticator-fix-permission.sh
    type: text/x-shellscript

I believe this commit should fix the issue: 7431077

I installed a kops 1.26.2 version which has 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.12 image and I still had to add your additionalUserData snippet

@ gregkoganvmm You saved my day :+1:

setrar avatar Mar 23 '23 16:03 setrar