DynamoDB connection pool tied up when interrupting
Describe the bug
In my service, I was time limiting a block of code which involved dynamodb queries and eventually after enough timeouts, I am seeing the following error: SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool. It appears that if i interrupt the ddb client, then the connection is forever tied up and there is no available connection to make further calls
After several runs, i noticed that if i run future.cancel(false) (no interrupt) vs future.cancel(true) the service remains stable, but the threads are allowed to finish. I verified with the java sdk client metrics that LeasedConcurrency goes up and never goes back down.
Expected behavior
When ddb client aborts due to interrupt (AbortedException), the http connection is released.
Current behavior
When ddb client aborts due to interrupt, the http connection is NOT released therefore eventually depleting the http pool connection.
Steps to Reproduce
Something like this:
final Future future = executorService.submit(() -> {
// some longer running ddb calls
}
final Object response;
try {
response = future.get(MS_LIMIT_TO_RESPOND, TimeUnit.MILLISECONDS);
} catch (TimeoutException exception) {
LOG.warn("Took longer than allotted {} ms to generate response.", MS_LIMIT_TO_RESPOND, exception);
} catch (InterruptedException | ExecutionException exception) {
LOG.error("Failed to generate response", exception);
} finally {
// If .cancel(true), then the thread will try to be interrupted, causing the issue.
future.cancel(true);
LOG.info("Returning response {}", response);
}
Possible Solution
Other tickets i saw with connection pool timeouts were regarding s3 and closing the object to ensure connection is released. I think upon sdk AbortedException or whatever exception for interrupt, the ddb connection should be closed.
Context
I was trying to limit the execution time on my service. If it didn't complete within a time limit, it would return an empty response.
AWS Java SDK version used
2
JDK version used
1.8
Operating System and version
Amazon Linux
My team is also encountering this problem. The investigation we have done seems to indicate that connections leased from the pool are mishandled in org.apache.http.impl.execchain.MainClientExec#execute
If the process is interrupted at the wrong time the connections will be lost. What can we do to fix this? Is it a known Apache client issue?
@andrewyoo @jocull I'm sorry for losing track of this. Are you still experiencing the issue?
Are you closing the data stream after it's consumed from the query response? Issues with connections that are not being released are usually associated with the resources not being properly closed.
@debora-ito I don't understand your question with regards to this ticket. In my use case, I had a ddb client (DynamoDbClient.create()) and i was interrupting a query. Because i was interrupting early, there was no response or data stream to close.
As for am I still experiencing it? I avoiding interrupting the ddb requests so I wouldn't have this issue, so i can't confirm if it still is a problem.
Our situation was the same as Andrew's - setting an interrupt on the thread running the request. We resorted to making requests with the async SDK and blocking on the results, but I would honestly prefer not to. It has been a year and we have not tried this again.
I did mention above the suspect code in the Apache library. It's possible that has been patched now but I have not revisited Apache change logs.
We released a fix via https://github.com/aws/aws-sdk-java-v2/pull/4066.
The fix is available on Java SDK version 2.20.83.
@andrewyoo @jocull I know the fix is not easy to test due to the nature of the issue and because you changed to async, but let us know of any other issues you find after upgrading to a newer version.
@debora-ito We applied the synchronous SDK again with the new version and tested both locally and in a load tested environment. We could not reproduce the issue this time so I believe it fixed. Thank you! 😄