aws-iam-authenticator icon indicating copy to clipboard operation
aws-iam-authenticator copied to clipboard

Roles with paths do not work when the path is included in their ARN in the aws-auth configmap

Open jceresini opened this issue 6 years ago • 50 comments

I have a role with an ARN that looks like this: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner. My aws-auth configmap was as follow:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSWorkerNode
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/EKSServiceWorker
      username: kubernetes-admin
      groups:
        - system:masters
    - rolearn: arn:aws:iam::XXXXXXXXXXXX:role/gitlab-ci/gitlab-runner
      username: gitlab-admin
      groups:
        - system:masters

I repeated got unauthorized errors from the cluster until I updated the rolearn to arn:aws:iam::XXXXXXXXXXXX:role/gitlab-runner. After that change my access worked as expected.

If it makes a difference, I'm using assume-role on our gitlab-runner, and using aws eks update-kubeconfig --region=us-east-1 --name=my-cluster to get kubectl configured.

jceresini avatar Sep 12 '19 21:09 jceresini

Running into the same issue here on EKS 1.14.6.

beetahnator avatar Sep 17 '19 21:09 beetahnator

Ahh....this explains our issue when testing with AWS SSO-created roles too. See the issue referenced in this document. This has been a problem for a quite a while (at least 14 months).

https://aws.amazon.com/blogs/opensource/integrating-ldap-ad-users-kubernetes-rbac-aws-iam-authenticator-project/

Pertinent passage: For the rolearn be sure to remove the /aws-reserved/sso.amazonaws.com/ from the rolearn url, otherwise the arn will not be able to authorize as a valid user.

When we stumbled across this I assumed it was something about the SSO role but based on this issue it's probably the path.

casey-robertson avatar Sep 19 '19 22:09 casey-robertson

We don't use EKS, but have had this issue with 1.12 and 1.14.6 with aws-iam-authenticator. If you edit the configmap to remove the /gitlab-ci portion, and restart the pods, you will likely find that access works.

My co-worker and I suspect that is because of the way that sts returns output for assumed role session arns.

We have a role arn:aws:iam::000000000000:/role/bosun/bosun_deploy that we use for cluster administration of our kops created clusters.

If you assume the role, and run aws sts get-caller-identity, we get the following:

{
    "UserId": "<redacted-AKID>:<redacted-userid>",
    "Account": "000000000000",
    "Arn": "arn:aws:sts::000000000000:assumed-role/bosun_deploy/<redacted-userid>"
}

I wish this was fixed, as of now, I'm not sure what to do other than creating a role with a shortened path and switch to it.

I suppose one can also just edit the role that gets input to the configmap itself.

rlangfordBV avatar Oct 15 '19 19:10 rlangfordBV

Yeah, removing the path is how I identified it as the cause of the issue.

The field name is rolearn and the path is part of the ARN for a given role.

I opened this so others running into the issue might find it, and also because I think something needs to address it, whether its documentation (though I don't think docs are sufficient without changing the name of the field in the configmap) or a bugfix

jceresini avatar Oct 16 '19 13:10 jceresini

We just discovered the same, by using

$ curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
$ TOKEN=$(aws-iam-authenticator token -i fooCluster --token-only)
$ aws-iam-authenticator verify -i fooCluster -t ${TOKEN}

and comparing the roles that the Pod uses (containing a path) vs. the one that are set in the token (path missing).

For now our workaround is also adding a role mapping to an IAM Role that "doesn't actually exist".

jangrewe avatar Oct 28 '19 12:10 jangrewe

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jan 26 '20 13:01 fejta-bot

/remove-lifecycle stale

jceresini avatar Jan 27 '20 17:01 jceresini

I was able to reproduce this issue. I created two roles: K8s-Admin and K8s-Admin-WithPath, I created the roles using the following commands:

  aws iam create-role \
  --role-name K8s-Admin \
  --description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
  --output text \
  --query 'Role.Arn'

  aws iam create-role \
  --role-name K8s-Admin-WithPath \
  --path "/kubernetes/" \
  --description "Kubernetes administrator role (for AWS IAM Authenticator for Kubernetes)." \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"AWS":"arn:aws:iam::<account id>:root"},"Action":"sts:AssumeRole","Condition":{}}]}' \
  --output text \
  --query 'Role.Arn'

Mapped them to the cluster with:

eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<account id>:role/K8s-Admin--group system:masters --username iam-admin

eksctl create iamidentitymapping --cluster basic-demo --arn arn:aws:iam::<accound id>:role/kubernetes/K8s-Admin-WithPath --group system:masters --username iam-admin-withpath

Then attached the AWS ReadOnly policy to both roles. Next, I created two AWS CLI profiles sandbox-k8s-admin and sandbox-k8s-admin-withpath specifying the rolearn options to trigger an assume role. After creating the roles, I updated my local kubeconfig:

eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin --set-kubeconfig-context --region=us-east-2

kubectl get nodes
# returned list of nodes, expected

Then switched over to the role with the path

eksctl utils write-kubeconfig --cluster=basic-demo --profile=sandbox-k8s-admin-withpath --set-kubeconfig-context --region=us-east-2

kubectl get nodes
# error: You must be logged in to the server (Unauthorized)

arhea avatar Apr 20 '20 20:04 arhea

Any news on this? This is quite a weird behavior and hard to detect as an error.

Comradin avatar May 04 '20 11:05 Comradin

We are seeing this issue as well, any word on resolution?

JeremyProffitt avatar Jun 04 '20 14:06 JeremyProffitt

+1

gaochundong avatar Jun 16 '20 11:06 gaochundong

I've enjoyed my 6+ hours lost to this.

sidewinder12s avatar Jun 22 '20 23:06 sidewinder12s

terraform workaround:

join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))

I'm not sure this is still needed with v0.5.1.

fred-vogt avatar Aug 12 '20 03:08 fred-vogt

terraform workaround:

join("/", values(regex("(?P<prefix>arn:aws:iam::[0-9]+:role)/[^/]+/(?P<role>.*)", <role-arn>)))

I'm not sure this is still needed with v0.5.1.

This was a very easy work-around for us, thank you

deadanon avatar Sep 18 '20 23:09 deadanon

Any update? Seems that this is still an issue.

nxtof avatar Nov 02 '20 08:11 nxtof

Hello, I'm having the same issue with aws-iam-authenticator version 0.5.2

othmane399 avatar Nov 18 '20 09:11 othmane399

This caught me too today what a PIA indeed.. Can confirm that instance role with a path will not be able to auth against the cluster - hopefully this gets fixed soon.

Jan 28 05:05:01 ip-10-31-8-66.us-west-1.compute.internal kubelet[3907]: E0128 05:05:01.251418    3907 kubelet_node_status.go:92] Unable to register node "ip-10-31-8-66.us-west-1.compute.internal" with API server: Unauthorized

Adding this in the hope it saves someone else a few hours of their life.

mattjamesaus avatar Jan 28 '21 05:01 mattjamesaus

A fix could be to have iam:GetRole permissions and "lookup" the full role info by "short" role name.

  • https://github.com/kubernetes-sigs/aws-iam-authenticator/blob/master/pkg/arn/arn.go Canonicalize() - lookup the full role info to return the Role ARN from AWS (so roles with non default (/) path would have the correct ARN)
  • https://docs.aws.amazon.com/sdk-for-go/api/service/iam/#IAM.GetRole
    • https://docs.aws.amazon.com/sdk-for-go/api/service/iam/#GetRoleOutput
      • https://docs.aws.amazon.com/sdk-for-go/api/service/iam/#Role

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/iam/get-role.html

I could create a sample PR if that helps.

fred-vogt avatar Jan 29 '21 02:01 fred-vogt

Between #333, #268, #153 and #98 - would be good to get duplicates closed and it tracked in one place

billinghamj avatar Mar 24 '21 15:03 billinghamj

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 22 '21 15:06 fejta-bot

/remove-lifecycle stale

christophetd avatar Jun 22 '21 19:06 christophetd

@jceresini would you be willing to update the issue description to mention the likely duplicates? That would help with triage.

sftim avatar Jul 02 '21 14:07 sftim

I'm not sure what you mean by that @sftim

The issue here is that the aws-auth configMap expects a roleArn, but you have to mangle the actual roleArn for it to work. When I submitted this, the caveat wasn't documented (to my knowledge). Now this document seems to mention it:

https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

Important

The role ARN cannot include a path. The format of the role ARN must be arn:aws:iam::<123456789012>:role/. For more information, see aws-auth ConfigMap does not grant access to the cluster.

IMO, that means the roleArn field in the configMap isn't the roleArn.

If the authentication works without the path, I would assume its easy for the logic that performs the authentication to handle the ARN with or without the path. That would save new users, who enter the actual roleArn into the configMap, from running into this odd behavior... without breaking functionality for everyone that has already entered a path-less roleArn in their config as a workaround.

jceresini avatar Jul 02 '21 14:07 jceresini

Please copy the list of duplicates from https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/268#issuecomment-805911359 into the description of this issue @jceresini (at the top - there's an edit button). That copying will make the duplication of issues more obvious.

sftim avatar Jul 02 '21 15:07 sftim

I've never seen github issues handled that way. Github has a way to mark issues as duplicates and make it obvious: https://docs.github.com/en/issues/tracking-your-work-with-issues/marking-issues-or-pull-requests-as-a-duplicate

Regarding that list of issues:

  • #333 Does not appear to be a duplicate. It sounds like they're asking about maintaining changing role ARNs in the k8s configMap
  • #268 is this issue
  • #153 appear to be a duplicate
  • #98 appears to be a duplicate but it was closed ~15mo before this was opened

jceresini avatar Jul 02 '21 15:07 jceresini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 30 '21 16:09 k8s-triage-robot

If you were willing to list those issues in the description for this issue, @jceresini, you'd be making life a little easier for other contributors. /remove-lifecycle stale

sftim avatar Sep 30 '21 17:09 sftim

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 29 '21 17:12 k8s-triage-robot

/remove-lifecycle stale

christophetd avatar Dec 31 '21 11:12 christophetd

I think one of the ways to fix this is for the authenticator to use the full arn if provided when doing a lookup, otherwise default to base role path (/)

dokuboyejo avatar Feb 16 '22 19:02 dokuboyejo