aws-sdk-ruby
aws-sdk-ruby copied to clipboard
Aws::CloudWatch::Client doesn't automatically refresh credentials
Describe the bug
We use the sidekiq-cloudwatchmetrics gem to send Sidekiq-related metrics to CloudWatch. Occasionally, we encounter the following error:
Aws::CloudWatch::Errors::ExpiredToken: The security token included in the request is expired
Upon investigation, we found that the gem initializes an Aws::CloudWatch::Client, which uses Aws::InstanceProfileCredentials. These credentials are supposed to auto-refresh before expiration. However, the error indicates that the token is not being refreshed in time, causing the metrics to fail. Once the error starts, it repeats, and the metrics fail to recover.
Regression Issue
- [ ] Select this option if this issue appears to be a regression.
Expected Behavior
The Aws::InstanceProfileCredentials should automatically refresh the security token before expiration (approximately 5 minutes before expiry) and ensure that the put_metric_data API call succeeds without errors. Metrics should continue to be sent to CloudWatch without interruption.
Current Behavior
The Aws::CloudWatch::Client raises Aws::CloudWatch::Errors::ExpiredToken errors when attempting to send metrics.
Once this error occurs:
- The metrics stop being sent to CloudWatch.
- The error persists until the process is restarted or manually handled.
- The auto-refresh mechanism in
Aws::InstanceProfileCredentialsdoes not appear to trigger correctly in this context.
Reproduction Steps
Configure the sidekiq-cloudwatchmetrics gem in a Sidekiq setup that uses an IAM role (e.g., on EC2 or ECS).
You can find the configuration I used here:
Allow the process to run for an extended period, relying on auto-refreshable credentials from Aws::InstanceProfileCredentials.
Observe that after some time, the following error may appear:
Aws::CloudWatch::Errors::ExpiredToken: The security token included in the request is expired
Possible Solution
No response
Additional Information/Context
Here are some useful links for more context regarding my problem:
- The issue I opened in the
sidekiq-cloudwatchmetricsrepo https://github.com/sj26/sidekiq-cloudwatchmetrics/issues/48 - The PR I opened to fix the problem in the
sidekiq-cloudwatchmetricsrepo https://github.com/sj26/sidekiq-cloudwatchmetrics/pull/49/files
Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-cloudwatch
Environment details (Version of Ruby, OS environment)
Ruby - 3.2.5, OS - Debian GNU/Linux 12 (bookworm), Docker image - ruby:3.2.5-slim, sidekiq-cloudwatchmetrics - 2.6.0
You are correct that the InstanceProfileCredentials should be auto refreshing (if they are within 5 minutes of expiration).
We added "static stability" logic awhile back that allow expired credentials to be used after their expiration when the underlying credentials service isn't refreshing credentials on the hosts (eg: an outage). Are you seeing any logs like Attempting credential expiration extension due to a credential service availability issue?
Have you tested your PR that does the explicit credentials refresh - does that fix the issue for you?
Hi @alextwoods,
I’m unable to reproduce the error, so I can’t verify whether the issue has been resolved. Even if my PR makes it to production, I won’t be able to confirm if it fixes the issue for me, as the error occurs randomly across different production environments.
Are you seeing any logs like
Attempting credential expiration extension due to a credential service availability issue?
No, I didn't find any logs such as what you suggested. I tried looking for anything containing the words 'credential' and 'attempting' (besides the exact match of the full message you sent) but I didn't find anything either.
This may be a naive question, but are you certain that the other gem is resolving instance profile credentials, and credentials are not set globally or in some other way? Reading your other thread, it looks like you tried creating a client on your own, and the credentials defaulted to instance profile credentials (which is last in the chain) but it's possible in your app in production, it is set to something else. I would actually output an instance of your credentials provider after deploying somehow.
Hi @mullermp,
are you certain that the other gem is resolving instance profile credentials, and credentials are not set globally or in some other way?
The other gem's CloudWatch publisher requires a client argument, and users can determine how this client is configured. In my case, I explicitly pass Aws::CloudWatch::Client.new(region: REGION_PLACEHOLDER), which resolves to instance profile credentials in the production environment.
I would actually output an instance of your credentials provider after deploying somehow
I shared this in the other thread as well, but here’s the truncated output of @client.config (where @client is the CloudWatch client I pass to the gem’s publisher) from a production environment:
#<struct
credentials=
#<Aws::InstanceProfileCredentials:0x00007f78ee260660
...
@credentials=#<Aws::Credentials access_key_id="PLACEHOLDER_ACCESS_KEY_ID">,
...>
...
secret_access_key=nil
session_token=nil
...>
This confirms that the client resolves to Aws::InstanceProfileCredentials, as expected in this setup.
Yes I saw that. Try passing an instance of instance profile credentials explicitly to the client in your production code. Looking at the gem source, I don't think any patch is necessary. The SDK will already refresh and retry if necessary.
@mullermp How come it doesn't already refresh and retry if it does use an instance of instance profile credentials?
It does and should. You can verify that by fetching instance credentials from irb or pry from your host and inspect them after accessing them before expiration. How did you verify the credentials class in production? Using a rails console or print statement? Or through local testing? By passing in credentials explicitly, we can be sure that is what is used. What version of aws-sdk-cloudwatch and aws-sdk-core are you using?
How did you verify the credentials class in production?
Rails console. I initialized an Aws::CloudWatch::Client object the way I do for thesidekiq-cloudwatchmetrics publisher and performed the checks on that object. You're saying the client credentials could differ when tested from rails console?
What version of aws-sdk-cloudwatch and aws-sdk-core are you using?
aws-sdk-cloudwatch (1.108.0) aws-sdk-core (3.214.0)
They shouldn't differ. Trust but verify, but did you pass rails env as production in your console? Are you setting global configuration anywhere such as Aws.config[key]?
Otherwise I'm not sure what may be going on. I'm inclined to believe it's something with your environment because if those credentials don't refresh then there would be a lot more noise and our internal usage would stop working.
did you pass rails env as production in your console?
We use a setup where the Rails environment name matches the server name (e.g., us, eu). The correct Rails env is automatically passed to the Rails console as an exported environment variable.
Are you setting global configuration anywhere such as Aws.config[key]?
No, we are not setting Aws.config[key] anywhere in our application.
if those credentials don't refresh then there would be a lot more noise and our internal usage would stop working.
Interestingly, the only component affected is the sidekiq-cloudwatchmetrics publisher. The app continues to function as expected, but metrics stop being sent to CloudWatch, which triggers an alarm after a short while.
The app continues to function as expected, but metrics stop being sent to CloudWatch, which triggers an alarm after a short while.
Do you use other AWS services in your application? If so, its interesting that you would get expired token issues only in cloudwatch.
Do you know if you are using IMDS V1 or V2? And do you know if you've disabled V1 through configuration (example the AWS_EC2_METADATA_V1_DISABLED env variable)? There are some differences in the refresh process for credentials from V1 and V2.
@alextwoods
Do you use other AWS services in your application?
Yes, S3, SSM, STS...
Do you know if you are using IMDS V1 or V2?
We're using IMDS V2.
And do you know if you've disabled V1 through configuration
Yes, IMDS V1 is disabled.
I believe the backtrace of the error could provide valuable insights for the investigation:
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/raise_response_errors.rb:17 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/invocation_id.rb:16 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/param_converter.rb:26 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/request_callback.rb:89 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/response_paging.rb:12 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/plugins/response_target.rb:24 in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in block in call
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/telemetry/no_op.rb:29 in in_span
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:53 in span_wrapper
/gems/aws-sdk-core-3.214.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in call
/gems/aws-sdk-core-3.214.0/lib/seahorse/client/request.rb:72 in send_request
/gems/aws-sdk-cloudwatch-1.108.0/lib/aws-sdk-cloudwatch/client.rb:3865 in put_metric_data
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:235 in block in publish
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in each
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in each_slice
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:234 in publish
/gems/sidekiq-cloudwatchmetrics-2.6.0/lib/sidekiq/cloudwatchmetrics.rb:77 in run
/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:8 in watchdog
/gems/sidekiq-6.5.12/lib/sidekiq/component.rb:17 in block in safe_thread
I don't think that backtrace is helpful. One thing you can try is to add http_wire_trace: true to your client to see wire logs, and perhaps even add the http_debug_output to an instance of the instance profile credentials. This might give clues on what requests are being made. Note that these do not redact your credentials so be careful with logging and sharing.
Yeah, if possible adding http_debug_output to an instance of the instance profile credentials would be the most useful as otherwise theres no logging from the credential refresh. My best guess is that something unusual is going on with the refresh (maybe an error from the instance thats getting ignored?).
If you can't enable http_debug_output on an instance of the instance profile credentials that you're using to configure your cloudwatch client because of logging concerns, you could at least provide a before_refresh proc:
before_refresh = proc do |creds| # called with self
# do some logging....
end
credentials = Aws::InstanceProfileCredentials.new(before_refresh: before_refresh).
client = Aws::Cloudwatch::Client.new(credentials: credentials)
Are the http_debug_output logs outputted using the default rails logger? If so, is there some sort of way for me to output those logs to a specific file?
You can pass an instance of a logger (or technically anything that responds to <<) as the http_debug_output, eg:
credentials = Aws::InstanceProfileCredentials.new(http_debug_output: Rails.logger)
But again as a general warning - this will log the actual requests and responses which include credentials.
Hey @alextwoods
I had to verify it with our management team and sadly I can't do any interaction related to http_debug_output.
It seems like the current direction of resolving this issue is using the forked version of sidekiq-cloudwatchmetrics that contains my fix.
If I'll find that it does solve the issue, would it prove the need for a fix on your side?
Is there something else I can do to provide extra information?
I think we would want proof and root cause before taking a patch. Your fix in the other gem is a brute force solution which just retries after refreshing. I am inclined to believe it is something with your setup or environment.
@Roy-Gal-Git - yeah that makes sense. I'd recommend in that case at least adding a before_refresh proc that logs when refreshes are happening (see above: https://github.com/aws/aws-sdk-ruby/issues/3162#issuecomment-2581516054).
I do think it will be useful to know whether you're sidekiq patch does address the issue.
An additional possible cause - time drift on your instance - if the system clock is off, the refresh of credentials on in the instance itself will be incorrect. Are you running anything like chrony, Amazon time Sync service or some other NTP setup?
@alextwoods
I do think it will be useful to know whether you're sidekiq patch does address the issue.
I see that the current label is response-requested but It'll probably take more than a week for me to gain new insights from our environments...
Are you running anything like chrony, Amazon time Sync service or some other NTP setup?
No, we don't use those.
@mullermp @alextwoods I have 2 small updates:
- We found 2 more reoccurring errors that might be of the same root cause:
Aws::Errors::MissingCredentialsError: unable to sign request without credentials setBacktrace:
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:119 in rescue in initialize
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:108 in initialize
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:33 in new
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:33 in signer_for
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/sign.rb:45 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/transfer_encoding.rb:27 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:12 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/s3_signer.rb:53 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/redirects.rb:20 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:69 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:365 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:385 in with_metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/retry_errors.rb:365 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/md5s.rb:32 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/http_checksum.rb:20 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/endpoint_pattern.rb:30 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:137 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:94 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:104 in with_metric
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/request_compression.rb:94 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/rest/content_type_handler.rb:27 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:50 in block in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:56 in with_metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/express_session_auth.rb:50 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/expect_100_continue.rb:23 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/bucket_name_restrictions.rb:21 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/rest/handler.rb:10 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/recursion_detection.rb:18 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:52 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:66 in with_metrics
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/endpoints.rb:52 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/endpoint_discovery.rb:84 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/endpoint.rb:46 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/param_validator.rb:26 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/raise_response_errors.rb:16 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/sse_cpk.rb:24 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/dualstack.rb:21 in call
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/plugins/accelerate.rb:43 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/invocation_id.rb:16 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/idempotency_token.rb:19 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/param_converter.rb:26 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/request_callback.rb:89 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/response_paging.rb:12 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/plugins/response_target.rb:24 in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in block in call
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/telemetry/no_op.rb:29 in in_span
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:53 in span_wrapper
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/telemetry.rb:39 in call
/gems/aws-sdk-core-3.212.0/lib/seahorse/client/request.rb:72 in send_request
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/client.rb:16571 in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:66 in block in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in block in open_file
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in open
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:55 in open_file
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:65 in put_object
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:46 in block in upload
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/file_uploader.rb:40 in upload
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/customizations/object.rb:477 in block in upload_file
/gems/aws-sdk-core-3.212.0/lib/aws-sdk-core/plugins/user_agent.rb:60 in metric
/gems/aws-sdk-s3-1.169.0/lib/aws-sdk-s3/customizations/object.rb:476 in upload_file
And
Aws::Sigv4::Errors::MissingCredentialsError: missing credentials, provide credentials with one of the following options: - :access_key_id and :secret_access_key - :credentials - :credentials_providerBacktrace:
/gems/aws-sigv4-1.9.1/lib/aws-sigv4/signer.rb:717 in extract_credentials_provider
/gems/aws-sigv4-1.9.1/lib/aws-sigv4/signer.rb:147 in initialize
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:235 in new
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:235 in build_v4_signer
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/plugins/s3_signer.rb:14 in block in <class:S3Signer>
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:72 in call
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:215 in block in resolve_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:59 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:59 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:214 in resolve_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:207 in value_at
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:191 in block in resolve
/usr/local/lib/ruby/3.2.0/set.rb:511 in each_key
/usr/local/lib/ruby/3.2.0/set.rb:511 in each
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:191 in resolve
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:179 in apply_defaults
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/configuration.rb:152 in build!
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:65 in build_config
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:21 in initialize
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/client.rb:435 in initialize
/gems/aws-sdk-core-3.204.0/lib/seahorse/client/base.rb:102 in new
/gems/aws-sdk-s3-1.113.2/lib/aws-sdk-s3/resource.rb:28 in initialize
Do you think that the 3 errors are related in any kind of way?
- My sidekiq-cloudwatchmetrics patch is running in one of our environments for 2 weeks now. The (original) error didn't occur yet. I don't think that it's enough time to assume that the fix does work, but it looks like a good direction.
With the test for you are doing with the patch, are you also running with the before_refresh logging?
If the patch does work, we'll want to understand if the SDK is trying to refresh credentials already.
Those errors could be related - in general both are caused by some environment setup issue that results in the SDK being unable to find valid credentials during client initialization. Assuming these errors are happening from code running on EC2/ECS it would indicate a potential issue with IMDS not having valid credentials. This could happen for example if there is a clock skew issue on the host or something else is misconfigured.
@alextwoods
With the test for you are doing with the patch, are you also running with the before_refresh logging?
No, it's just the patch.
Assuming these errors are happening from code running on EC2/ECS it would indicate a potential issue with IMDS not having valid credentials.
It runs on EC2 machines that are managed by an Elastic Beanstalk environment.
Your previous code had expired credentials (a valid ruby object with just invalid credentials) but that new error indicates your entire credentials object has been blown away. Here is where that is raised: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems%2Faws-sdk-core%2Flib%2Faws-sdk-core%2Fplugins%2Fsign.rb. I do think it's still something with your setup or environment - no SDK code will remove credentials from configuration that I'm aware of.
@mullermp Interesting!
Is it possible that initializing Aws::S3::Resource objects with access_key_id and secret_access_key set to nil could be causing this issue?
Aws::S3::Resource.new(region: 'REGION_PLACEHOLDER', access_key_id: nil, secret_access_key: nil)
I also noticed that sometimes we pass a hardcoded region, while other times we use a memoized instance of Aws::EC2Metadata to fetch the EC2 instance's region from the metadata API. Could this discrepancy contribute to the credential drifts you mentioned?
For context, we always use the latest AMI for our Elastic Beanstalk environments and don’t make any manual changes to it. In practice, we run everything with Docker on ECS. Could something in this setup be affecting credential persistence?
Why are you setting a fake region and nil credentials in your client/resource?
@mullermp I just changed the region to the placeholder. It's 'us-east-1' if it matters.
The credentials are nil so they'll fetched from the default credential provider chain.
@alextwoods @mullermp
An additional possible cause - time drift on your instance - if the system clock is off, the refresh of credentials on in the instance itself will be incorrect. Are you running anything like chrony, Amazon time Sync service or some other NTP setup?
I was wrong when I said we don't use those services.
The instance is using Amazon Time Sync Service (169.254.169.123) with Chrony. The system time is accurate, with an offset of only a few nanoseconds, and chronyc tracking confirms synchronization is stable. Time drift is not an issue.
You should not need to set any keys to nil at all for the credentials provider chain to work. They can be omitted. At some point, your credentials are being blown away. I'm not sure why, but perhaps usage in your code base that we cannot see.