aws-sdk-ruby icon indicating copy to clipboard operation
aws-sdk-ruby copied to clipboard

credentials static web identity credentials not picked up without an explicit profile option

Open HoneyryderChuck opened this issue 7 months ago • 5 comments

Describe the bug

We're using STS web identity token credentials to manage AWS SDK service call authentication via SigV4 (aws-sdk-core (3.211.0), should be the relevant bit). We noticed however, during load testing, some rate limiting happening, which we narrowed down to multiple calls to the STS token refresh endpoint, exacerbated by the scaling up of certain services. After some investigation, we figured out that multiple clients (for SQS, SNS, etc) were instantiated, and the token refresh path was called for each of them. This is not ideal, as there should be a single "refresh" happening, and the token should at best be shared across the multiple clients.

In the process, we found out that the AWS SDKs already support that via "shared config", and we proceeded to change our approach using it, replacing the env var setup using AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN with a AWS_CONFIG_FILE pointing to a file like this:

[default]
web_identity_token_file = $AWS_WEB_IDENTITY_TOKEN_FILE
role_arn = $AWS_ROLE_ARN

Expected Behavior

This should have worked, as in, when instantiating clients multiple times, there should only be one call to STS. This can be observed using this script:

Aws.config[:http_wire_trace] = true

Aws::SQS::Client.new
# one STS request log
Aws::SQS::Client.new
# no STS request log

Current Behavior

when instantiating two client instances, two STS requests are made.

While we noticed that Aws.shared_config was correctly filled up with the expected values, unfortunately calling i.e. Aws::SQS::Client.new was still going through this strategy and generating a token + refresh loop per client, because of this clause, i.e. the routine expects a profile to be set, however it's nil as per the option setup, although the docs say it's "default".

I believe this is a bug, as if "default", this would have worked. I also tried setting the AWS_PROFILE env var, but that doesn't fill it up either.

FWIW, setting Aws.config[:profile] = "default", or explicit option set a la Aws::SQS::Client.new(profile: "default"), work as expected (shared config is picked up). However, I'm looking for a "no config code" setup that can be rolled out across multiple services.

Reproduction Steps

Aws.config[:http_wire_trace] = true

Aws::SQS::Client.new
# one STS request log
Aws::SQS::Client.new
# no STS request log

Possible Solution

Perhaps initializing the profile option to whatever AWS_PROFILE defines?

Additional Information/Context

No response

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-core 3.211.0

Environment details (Version of Ruby, OS environment)

"ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [x86_64-linux]"

HoneyryderChuck avatar Apr 29 '25 15:04 HoneyryderChuck

Thanks for opening an issue. I'm not sure if it's possible (or safe) to have that credentials object shared between two clients implicitly. Each time a client is constructed, we go through the credentials provider chain, to determine the credentials to use. We don't want to assume the same credentials are used for all clients, especially in cases where you call a service to retrieve a new set of credentials, to use for another service (e.g. STS and SSO).

The profile does default to "default" but not through the config struct, it looks to be defaulting in the credential provider chain here. I'm not sure how the setting of profile worked for you - from what I see, it would go to this path and that would still create a new instance of the credentials object each time.

I know you're looking for a "no code solution" but I think your easiest/best option is to set Aws.config[:credentials] to an instance of AssumeRoleWebIdentityCredentials if all of your clients use it - you will also skip the whole chain process and it would improve your initialization time.

mullermp avatar Apr 29 '25 18:04 mullermp

Hey @mullermp 👋 thx for the reply.

I'm not sure how the setting of profile worked for you - from what I see, it would go to this path and that would still create a new instance of the credentials object each time.

You're correct 😭 I may have misread the results of my tests yesterday (should wait for the next morning before opening issues), but I can confirm that an STS request is issued per client.

So I guess this just became a feature request 😂

I'm not sure if it's possible (or safe) to have that credentials object shared between two clients implicitly.

I think you're right that, currently, running the .resolve when considering shared web identity tokens may not be safe, considering that calling this from different threads may trigger separate requests to STS to get a token; whether those would return the same, or different valid tokens, you can answer better than me.

I think it wouldn't be hard to make it safe though, by using a similar strategy (and perhaps the same mutex?) used when refreshing the credentials.

I know you're looking for a "no code solution" but I think your easiest/best option is to set Aws.config[:credentials] to an instance of AssumeRoleWebIdentityCredentials if all of your clients use it

Indeed, that's what we're doing. The main issue is having to copy and maintain that snippet across different apps / versions of aws sdk. There's also the slight issue of "fork-safeness" though: the web credentials object has a mutex (to manage the token refresh), so if I'm eager-loading the credentials, i.e. in a rails application initializer, then I'd be sharing a mutex across forked processes, which is not (correct me if I'm wrong) fork-safe (at least the default lazy approach of aws sdk protects us from that somewhat, right?).

Nevertheless, I'm ready for this to be rejected. My initial proposal was hanging on this behaviour for shared config / credentials being the default for all other SDKs, and given that this is not a bug as initially reported, it probably means that this is not supported across SDKs, and that'd be important in my use case as well (which makes this a bit unattainable).

HoneyryderChuck avatar Apr 30 '25 14:04 HoneyryderChuck

Sorry for the delayed response. I think we could possibly do this in a new major version, which we are working towards. Credential resolution will be fundamentally different however - they will be resolved at request time rather than client initialization time. This is because we have new requirements for supporting multiple auth types (sigv4 and also bearer tokens). At request time, we would resolve the auth scheme, and then try to fetch auth information for that scheme. I think what we would probably do is check shared config sources, and then cache those credentials. We would go a step further and probably make shared config a Singleton and return the same credentials object per profile for all clients. I will have to discuss this with the larger SDK team for feedback. In any event, I do think this may be an upsetting enough behavior change that I can't do it in place.

mullermp avatar May 10 '25 02:05 mullermp

Thx for getting back in touch 🙏 didn't know that there was a v4 in the works.

At request time, we would resolve the auth scheme, and then try to fetch auth information for that scheme.

Correct me if I'm wrong, but this means that, if you don't apply something like what this ticket is proposing (I guess this is what you mean by the shared config singleton), at least for the STS token case, it'll generate a token per-request, which could trigger rate limiting more often, right?

HoneyryderChuck avatar May 12 '25 10:05 HoneyryderChuck

It shouldn't - it just means that the first credentials fetch will be at the first operation invocation rather than Client.new, because in Smithy (the modeling language new SDKs are moving towards), authentication is determined per operation instead of per service client, which allows clients like bedrock to support either bearer tokens or sigv4 credentials. On that first fetch, we need to cache the credentials object, so that subsequent requests will use it. I think we need yet another caching layer (which could be done in this major version except that it would potentially cause breaking behavior?) where any credential provider resolved via shared config are also cached per profile, so that multiple clients will always resolve to that credentials provider.

mullermp avatar May 12 '25 20:05 mullermp