AWS sts:AssumeRole stopped working with role/OrganizationAccountAccessRole in 1.30.x
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Testing upgrade from Client version: 1.29.2 (git-v1.29.2) to Client version: 1.30.1 (git-v1.30.1)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
v1.29.9
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops_v1.30.1 update cluster - no other changes to manifest or environment, only executing newer kops binary.
5. What happened after the commands executed?
$ export AWS_PROFILE=company-name-dev3
$ kops_v1.30.1 update cluster
SDK 2024/09/20 14:31:06 DEBUG request failed with unretryable error https response error StatusCode: 403, RequestID: 623bd87e-11e1-4b06-9f16-10f60ba2f030, api error AccessDenied: User: arn:aws:sts::[redacted]006:assumed-role/OrganizationAccountAccessRole/aws-go-sdk-1726842666098977639 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole Error: error determining default DNS zone: error querying zones: error listing hosted zones: operation error Route 53: ListHostedZones, get identity: get credentials: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 623bd87e-11e1-4b06-9f16-10f60ba2f030, api error AccessDenied: User: arn:aws:sts::[redacted]006:assumed-role/OrganizationAccountAccessRole/aws-go-sdk-1726842666098977639 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole
6. What did you expect to happen?
With kops-1.29.2 the output shows proposed changes that need to be applied with --yes
AWS CLI is able to successfully get Route53 zones from the same shell:
$ aws route53 list-hosted-zones
{
"HostedZones": [
{
"Id": "/hostedzone/Z0[redacted]",
"Name": "k8s.dev3.us-west-2.example.com.",
"CallerReference": "8e483d8f-0d3c-4bcc-9c68-ecb4dea807ae",
"Config": {
"PrivateZone": false
},
"ResourceRecordSetCount": 8
}
}
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
https://gist.github.com/vitaliyf/cfddd9ad771ee613ee850bb9e2d3fe14
9. Anything else do we need to know?
$ cat ~/.aws/config
[default]
region = us-west-2
[profile company-name]
aws_account_id = company-name
region = us-west-2
output = json
color = ff0000
[profile company-name-dev1]
role_arn = arn:aws:iam::[redacted]385:role/OrganizationAccountAccessRole
source_profile = company-name
[profile company-name-dev2]
role_arn = arn:aws:iam::[redacted]813:role/OrganizationAccountAccessRole
source_profile = company-name
color = 00ff00
[profile company-name-dev3]
role_arn = arn:aws:iam::[redacted]006:role/OrganizationAccountAccessRole
source_profile = company-name
color = 0000ff
This cluster has been continuously upgraded one kops/kubernetes version at a time for at least a couple years, so it is pretty routine for us to test and execute such upgrades in-place.
I tried to look around and I suspect this is related to aws-sdk-go-v2 upgrade.
For example, they have this issue: https://github.com/aws/aws-sdk-go-v2/issues/2686 - and coincidentally or not, that ticket is referenced by https://github.com/cert-manager/cert-manager/pull/7236 where they are also dealing with "Missing Region" error just like https://github.com/kubernetes/kops/issues/16645 from kops-1.30.0
Workaround: use awsudo or other workarounds from https://kops.sigs.k8s.io/mfa/#the-workaround-2
$ awsudo company-name-dev3 kops_v1.30.1 update cluster
...
+ NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.30.1/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.30.1/nodeup-linux-amd64
- NODEUP_URL_AMD64=https://artifacts.k8s.io/binaries/kops/1.29.2/linux/amd64/nodeup,https://github.com/kubernetes/kops/releases/download/v1.29.2/nodeup-linux-amd64
...more as-expected output..
Must specify --yes to apply changes
FYI @rifelpet
Any progress on this ?
I don't have a sufficient AWS Organization setup to be able to reproduce this bug and given that the awsudo workaround is straight forward, I haven't made any progress.
If anyone wants to contribute a fix to the AWS SDK code in this file, I'm happy to review it and test it against other ~/.aws/config setups.
Hey @rifelpet, it seems like maybe it's the fact that we're trying to configure the region for STS purposes when we obtain it from IDMS, but that configuring STS causes issues for some configs.
I've tested the following change locally and it seems to be working for me: https://github.com/jvaldron/kops/commit/fed5ac36417203ec2446a5041bbc5ef9bf01485d
If this makes sense, I'll open a PR for it.
@jValdron yes feel free to open a PR for it