pulumi-aws
pulumi-aws copied to clipboard
Interesting flake on TestAccDeleteBeforeCreate
Interesting flake on TestAccDeleteBeforeCreate:
* Retrieving AWS account details: validating provider credentials: retrieving caller identity from STS:
operation error STS: GetCallerIdentity, https response error
StatusCode: 403, RequestID: 8b1a26fa-db29-43dd-b705-869eadafa74c,
api error ExpiredToken: The security token included in the request is expired
https://github.com/pulumi/pulumi-aws/issues/3655
Possibly a misconfiguration on our part? SO troubleshooting topicwith similar issues But the region is set globally for our tests.
This cron is flaking on the STS GetCallerIdentity credentials verification:
Retrieving AWS account details: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: <redacted>, api error ExpiredToken: The security token included in the request is expired
This is the same error throughout on the flakes
What is interesting is that this test never flakes on master
.
I deleted the EC2 instance that failed to be deleted in our last run.
I wonder if it's an expiration issue since node tests take just a little over an hour to run? But then why is the STS validation only failing on this test, and only intermittently? Should we skip credentials validation for this test perhaps?
UPDATE: this is now flaking on pull requests as well.
Flakes on master in https://github.com/pulumi/pulumi-aws/issues/3636 fwiw.. Not sure what's going on here.
We saw some flakes like this in the service, where credentials would expire during the life of the job. We were using the key-rotator at the time -- futzing with that only got so far. We eventually rolled out some OIDC magic that let the job seamlessly assume a role with long-enough-lasting credentials. @kmosher would know more.
Waiting on a runner is what is eating up most of the authenticated time:
2024-03-20T18:07:34.4133831Z Requested labels: ubuntu-latest
2024-03-20T18:07:34.4134259Z Job defined at: pulumi/pulumi-aws/.github/workflows/run-acceptance-tests.yml@refs/pull/3664/merge
2024-03-20T18:07:34.4134450Z Waiting for a runner to pick up this job...
2024-03-20T18:07:34.7221118Z Job is waiting for a hosted runner to come online.
2024-03-20T18:07:39.3277658Z Job is about to start running on the hosted runner: GitHub Actions 12 (hosted)
(the above is output from a running job after ~15 minutes)
Fixed via #3666
https://github.com/pulumi/pulumi-aws/actions/runs/8438448044 another instance
https://github.com/pulumi/ci-mgmt/pull/863 I think ultimately fixed it by doubling the timeout window.
Cannot close issue:
- does not have an assignee
Please fix these problems and try again.