aws-sdk-kotlin
aws-sdk-kotlin copied to clipboard
Credential provider fails intermittently on ECS (fargate)
Describe the bug
While running some code that processes messages rapidly from SQS, I saw a frequent, but irregular errors loading credentials for the worker under load. The set up is 1 dropwizard command process that starts up 10 CompletableFutures
from a thread pool. Each of those threads then rapidly polls for messages on the queue, process it, and then goes back to polling. Currently each time it polls it reauths with SqsClient.fromEnvironment()
(not ideal, but didn't expect an issue like this when using built-in creds). This is all running on one ECS task at a time (The error forced rerunning the task a few times in the end).
Over the course of processing about 15k messages the error happened 5 times. Below is a single exception pulled from the logs:
aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: No credentials could be loaded from the chain: CredentialsProviderChain -> EnvironmentCredentialsProvider -> ProfileCredentialsProvider -> StsWebIdentityProvider -> EcsCredentialsProvider -> ImdsCredentialsProvider
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$chainException$1.invoke(CredentialsProviderChain.kt:36)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$chainException$1.invoke(CredentialsProviderChain.kt:36)
at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:42)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials(CredentialsProviderChain.kt)
at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invokeSuspend(CachedCredentialsProvider.kt:58)
at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invoke(CachedCredentialsProvider.kt)
at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider$getCredentials$2.invoke(CachedCredentialsProvider.kt)
at aws.sdk.kotlin.runtime.config.CachedValue.getOrLoad(CachedValue.kt:59)
at aws.sdk.kotlin.runtime.auth.credentials.CachedCredentialsProvider.getCredentials(CachedCredentialsProvider.kt:55)
at aws.sdk.kotlin.runtime.auth.credentials.DefaultChainCredentialsProvider.getCredentials(DefaultChainCredentialsProvider.kt:68)
at aws.smithy.kotlin.runtime.auth.awssigning.DefaultAwsSignerImpl.sign(DefaultAwsSigner.kt:26)
at aws.smithy.kotlin.runtime.auth.awssigning.middleware.AwsSigningMiddleware.modifyRequest(AwsSigningMiddleware.kt:158)
at aws.smithy.kotlin.runtime.auth.awssigning.middleware.AwsSigningMiddleware.modifyRequest(AwsSigningMiddleware.kt:20)
at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:64)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.http.operation.MutateHandler.call(SdkOperationExecution.kt:126)
at aws.smithy.kotlin.runtime.http.operation.MutateHandler.call(SdkOperationExecution.kt:123)
at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invokeSuspend(Retry.kt:52)
at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invoke(Retry.kt)
at aws.smithy.kotlin.runtime.http.middleware.Retry$handle$outcome$1.invoke(Retry.kt)
at aws.smithy.kotlin.runtime.retries.StandardRetryStrategy.doTryLoop(StandardRetryStrategy.kt:51)
at aws.smithy.kotlin.runtime.retries.StandardRetryStrategy.retry(StandardRetryStrategy.kt:32)
at aws.smithy.kotlin.runtime.http.middleware.Retry.handle$suspendImpl(Retry.kt:36)
at aws.smithy.kotlin.runtime.http.middleware.Retry.handle(Retry.kt)
at aws.sdk.kotlin.runtime.http.middleware.AwsRetryMiddleware.handle(AwsRetryMiddleware.kt:38)
at aws.sdk.kotlin.runtime.http.middleware.AwsRetryMiddleware.handle(AwsRetryMiddleware.kt:30)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.io.middleware.ModifyRequestMiddleware.handle(ModifyRequest.kt:24)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:64)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.http.operation.SerializeHandler.call(SdkOperationExecution.kt:119)
at aws.smithy.kotlin.runtime.http.operation.SerializeHandler.call(SdkOperationExecution.kt:102)
at aws.smithy.kotlin.runtime.http.operation.InitializeHandler.call(SdkOperationExecution.kt:99)
at aws.smithy.kotlin.runtime.io.middleware.Phase.handle(Phase.kt:60)
at aws.smithy.kotlin.runtime.io.middleware.DecoratedHandler.call(Middleware.kt:40)
at aws.smithy.kotlin.runtime.http.operation.SdkHttpOperationKt.execute(SdkHttpOperation.kt:83)
at aws.smithy.kotlin.runtime.http.operation.SdkHttpOperationKt.roundTrip(SdkHttpOperation.kt:66)
at aws.sdk.kotlin.services.sqs.DefaultSqsClient.receiveMessage(DefaultSqsClient.kt:620)
at com.example.Sqs$poll$1.invokeSuspend(SqsTaskChannel.kt:36)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
at com.example.Sqs.poll(SqsTaskChannel.kt:34)
at com.example.Task.processNextAvailableTask(TaskService.kt:19)
at com.example.Run.run$lambda-3$lambda-1$lambda-0(RunTasks.kt:69)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
... 4 common frames omitted
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Missing value for environment variable `AWS_ACCESS_KEY_ID`
at aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider.requireEnv(EnvironmentCredentialsProvider.kt:23)
at aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider.getCredentials(EnvironmentCredentialsProvider.kt:26)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:39)
... 61 common frames omitted
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: could not find source profile default
at aws.sdk.kotlin.runtime.auth.credentials.profile.ProfileChain$Companion.resolve$aws_config(ProfileChain.kt:44)
at aws.sdk.kotlin.runtime.auth.credentials.ProfileCredentialsProvider.getCredentials(ProfileCredentialsProvider.kt:101)
at aws.sdk.kotlin.runtime.auth.credentials.ProfileCredentialsProvider$getCredentials$1.invokeSuspend(ProfileCredentialsProvider.kt)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
... 14 common frames omitted
Suppressed: aws.sdk.kotlin.runtime.auth.credentials.ProviderConfigurationException: Required field `roleArn` could not be automatically inferred for StsWebIdentityCredentialsProvider. Either explicitly pass a value, set the environment variable `AWS_ROLE_ARN`, or set the JVM system property `aws.roleArn`
at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityCredentialsProvider$Companion.fromEnvironment-TUY-ock(StsWebIdentityCredentialsProvider.kt:133)
at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityCredentialsProvider$Companion.fromEnvironment-TUY-ock$default(StsWebIdentityCredentialsProvider.kt:54)
at aws.sdk.kotlin.runtime.auth.credentials.StsWebIdentityProvider.getCredentials(DefaultChainCredentialsProvider.kt:87)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain.getCredentials$suspendImpl(CredentialsProviderChain.kt:39)
at aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderChain$getCredentials$1.invokeSuspend(CredentialsProviderChain.kt)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
... 14 common frames omitted
Suppressed: aws.smithy.kotlin.runtime.serde.DeserializationException: Unexpected JSON token at offset 0; found `Y`, expected one of `{`, `[`, `\"`, `null`, `true`, `false`, `<number>`
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.fail(JsonLexer.kt:356)
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.fail$default(JsonLexer.kt:355)
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.unexpectedToken(JsonLexer.kt:352)
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.readToken(JsonLexer.kt:202)
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.doPeek(JsonLexer.kt:85)
at aws.smithy.kotlin.runtime.serde.json.JsonLexer.peek(JsonLexer.kt:72)
at aws.smithy.kotlin.runtime.serde.json.JsonDeserializer.deserializeStruct(JsonDeserializer.kt:110)
at aws.sdk.kotlin.runtime.auth.credentials.JsonCredentialsDeserializerKt.deserializeJsonCredentials(JsonCredentialsDeserializer.kt:133)
... 18 common frames omitted
Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.InvalidJsonCredentialsException: invalid JSON credentials response
at aws.sdk.kotlin.runtime.auth.credentials.JsonCredentialsDeserializerKt.deserializeJsonCredentials(JsonCredentialsDeserializer.kt:116)
at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsDeserializer.deserialize(EcsCredentialsProvider.kt:177)
at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsDeserializer$deserialize$1.invokeSuspend(EcsCredentialsProvider.kt)
... 16 common frames omitted
Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: Failed to get credentials from container metadata service
at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsProvider.getCredentials(EcsCredentialsProvider.kt:114)
at aws.sdk.kotlin.runtime.auth.credentials.EcsCredentialsProvider$getCredentials$1.invokeSuspend(EcsCredentialsProvider.kt)
... 16 common frames omitted
Suppressed: java.net.SocketException: Invalid argument
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:579)
at java.base/sun.nio.ch.Net.connect(Net.java:568)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128)
at okhttp3.internal.connection.ConnectPlan.connectSocket(ConnectPlan.kt:254)
at okhttp3.internal.connection.ConnectPlan.connectTcp(ConnectPlan.kt:128)
at okhttp3.internal.connection.SequentialExchangeFinder.find(SequentialExchangeFinder.kt:33)
at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:267)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:84)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:65)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at datadog.trace.instrumentation.okhttp3.TracingInterceptor.intercept(TracingInterceptor.java:35)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:205)
at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:533)
... 3 common frames omitted
Wrapped by: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: failed to load instance profile
at aws.sdk.kotlin.runtime.auth.credentials.ImdsCredentialsProvider.getCredentials(ImdsCredentialsProvider.kt:62)
at aws.sdk.kotlin.runtime.auth.credentials.ImdsCredentialsProvider$getCredentials$1.invokeSuspend(ImdsCredentialsProvider.kt)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.UndispatchedCoroutine.afterResume(CoroutineContext.kt:233)
at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:102)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
... 14 common frames omitted
Wrapped by: java.util.concurrent.CompletionException: aws.sdk.kotlin.runtime.auth.credentials.CredentialsProviderException: No credentials could be loaded from the chain: CredentialsProviderChain -> EnvironmentCredentialsProvider -> ProfileCredentialsProvider -> StsWebIdentityProvider -> EcsCredentialsProvider -> ImdsCredentialsProvider
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1807)
at com.example.Executor.execute$lambda-1(ContextAwareExecutor.kt:42)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Expected behavior
Getting credentials while running on ECS should not arbitrarily fail
Current behavior
Getting credentials while running on ECS is arbitrarily failing
Steps to Reproduce
Run SqsClient.fromEnvironment()
on ECS with an execution role configured in a hot loop
Possible Solution
there is both a SocketException
and a DeserializationException
in the stack provided, which seem like a start
Context
No response
AWS Kotlin SDK version used
0.17.0-beta
Platform (JVM/JS/Native)
JVM
Operating System and version
fargate provider for ECS
Thanks for the report. I'm currently attempting to reproduce.
I've reproduced this.
The container metadata service appears to be rate-limiting, it returns this string if you try to obtain credentials too many times in a short period:
You have reached maximum request limit.
(notice the "unexpected token Y
" in the deserialization exception)
What's concerning to me is that this is returned as a raw string instead of a json payload, which is what we expect from the API. We even have code in this credentials provider to handle json-based errors that it might return. I'm hesitant to simply key off of this message given that it's a string out of band, I think there's a good chance it's not part of the credential service's modeled API and therefore it's not something we should rely on receiving. We'll need to discuss amongst the team how we want to handle this error, if at all.
In the meantime, you should be able to avoid this if you create a single client for message polling.
Thanks for the deep look, and the rate limit explanation makes a lot of sense with the pattern that we saw: a lot of successes on the first run and then a sudden error, and upon restart a much shorter time before the error.
I agree that not doing the naive authenticating every iteration is the solution for us, but just wanted to be sure to bring the error up here as it was still surprising and would hope to be able to catch some specific ClientException
in the future.
Thanks again!
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.