botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Default STS Client uses global endpoint but recommendation is now regional endpoints

Open bacoboy opened this issue 1 year ago • 9 comments

Describe the feature

Around 2022, AWS announced that all new SDKs would change the default STS endpoint behavior from the legacy endpoint to regional as documented here.

All new SDK major versions releasing after July 2022 will default to regional. New SDK major versions might remove this setting and use regional behavior. To reduce future impact regarding this change, we recommend you start using regional in your application when possible.

This is used when clients call sts:AssumeRole. Using the legacy behavior, clients connect to sts.amazonaws.com, which lives in us-east-1. Workloads outside of that region using this configuration unknowingly depend on that region since they are not using the regional endpoint where their code runs.

botocore was never updated, so today, all calls to the STS API, unless otherwise explicitly configured, will use the "global" endpoint in us-east-1.

There was an outage in Aug 2024 which impacted STS in us-east-1.

Had botocore been updated, this specific event would not have impaired workloads running in other regions using a default client.

This PR attempts to align the new "default" to regional as specified by the documentation.

Should you require the old behavior, you can always set the environment variable to override the new default back to legacy (as documented):

export AWS_STS_REGIONAL_ENDPOINTS=legacy

A follow-up change to the documentation here will be needed to reflect this change.

Use Case

Any workload running in a region should, by default, use the regional STS endpoint for role assumption.

Proposed Solution

I've started a pull request here with the proposed change to the defaults, so the default will select regional endpoints rather than the legacy configuration if no additional configuration is specified.

  • https://github.com/boto/botocore/pull/3309

Other Information

No response

Acknowledgements

  • [X] I may be able to implement this feature request
  • [ ] This feature might incur a breaking change

SDK version used

Any current boto3 version

Environment details (OS name and version, etc.)

N/A - changes to SDK defaults

bacoboy avatar Nov 23 '24 23:11 bacoboy

Thanks for the feature request — however it would be a breaking change for users who expect and rely on the current behavior. But the team is considering the implications of making this change in the future. In the meantime you can use the AWS_STS_REGIONAL_ENDPOINTS environment variable or sts_regional_endpoints configuration (documented here). The existing documentation could potentially be improved here to clarify the expected behavior and workarounds.

tim-finnigan avatar Nov 25 '24 21:11 tim-finnigan

It is a change, but it isn't a breaking one. Can you elaborate on how this breaks anything? Thanks.

bacoboy avatar Nov 25 '24 22:11 bacoboy

It is a change, but it isn't a breaking one. Can you elaborate on how this breaks anything? Thanks.

This would be a breaking change for customers who expect the default configuration to be legacy. It could affect applications that rely on the global STS endpoint without explicitly configuring it. These apps might experience unexpected behavior or failures when suddenly using regional endpoints instead.

tim-finnigan avatar Nov 26 '24 20:11 tim-finnigan

You just described every change made to this library.

As I understand it, only very old token usage falls into this category, and those folks are unlikely to upgrade the library, much less their token usage.

Can't simple downgrade instructions be provided for the edge cases that run into issues in the release notes?

bacoboy avatar Dec 01 '24 19:12 bacoboy

This was also brought up 3 years ago in https://github.com/boto/botocore/issues/2577 and dismissed for the same reason. Is the plan to never upgrade this, regardless of the AWS ask to move to the new behavior?

bacoboy avatar Dec 09 '24 18:12 bacoboy

We published Updating AWS SDK defaults – AWS STS service endpoint and Retry Strategy this week, announcing that we're planning to make this change for existing SDKs on July 31st, 2025.

Since it is a change to behavior like discussed above, the ~six month window allows users to either test the new behavior by opting in early, or to pin to legacy if needed so that behavior won't change in the future upon upgrade.

ashovlin avatar Feb 12 '25 14:02 ashovlin

This is excellent news! It makes sense to have that transition period to give folks time. Specific details about when legacy might be needed would be helpful. I was told there were some older edge cases, but the details were light.

bacoboy avatar Feb 12 '25 17:02 bacoboy

We see some users allowlist specific AWS API endpoints in their network configuration. They might start blocking requests when SDKs change the endpoint upon upgrade for existing code paths.

The global endpoint vends a different format of tokens from the regional endpoints, notably a different length. This one is a little more unlikely the first, but we can't tell if folks have made assumptions about the current format in their code.

ashovlin avatar Feb 26 '25 19:02 ashovlin

Now that the backend has been modified to handle global endpoints regionally by default, the documentation should likely be updated to reflect this new behavior.

https://aws.amazon.com/about-aws/whats-new/2025/04/aws-sts-global-endpoint-requests-locally-regions-default/

bacoboy avatar Apr 28 '25 15:04 bacoboy