aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Regression in 2.28.6: huge increase in exceptions thrown but caught
/### Describe the bug
After upgrading to 2.30.x (we tried both .0 and .25), we see a huge increase in the number of NumberFormatException thrown by the SDK from e.g. async DynamoDB request (and potentially others, too). Even though they are caught by the SDK, the sheer number of exceptions caused noticeable increased pressure on GC.
Regression Issue
- [x] Select this option if this issue appears to be a regression.
Expected Behavior
No NumberFormatException, which is the case in ~2.29.52~ 2.28.5.
Current Behavior
Lots of NumberFormatException, which is the case in ~2.30.x~ 2.28.6.
Reproduction Steps
Send any async DynamoDB request and either set a breakpoint or use a profiler to observe the caught exceptions.
Possible Solution
No response
Additional Information/Context
A typical stack trace:
NumberFormatException.<init>(String)
NumberFormatException.forCharSequence(CharSequence, int, int, int)
Integer.parseInt(CharSequence, int, int, int)
URI$Parser.scanByte(int, int)
URI$Parser.scanIPv4Address(int, int, boolean)
URI$Parser.parseIPv4Address(int, int)
URI$Parser.parseServer(int, int, boolean)
URI$Parser.parseAuthority(int, int)
URI$Parser.parseHierarchical(int, int)
URI$Parser.parse(boolean)
URI.<init>(String)
URI.create(String)
SdkHttpRequest.getUri()
V4CanonicalRequest.getCanonicalUri(SdkHttpRequest, V4CanonicalRequest$Options)
V4CanonicalRequest.canonicalUri()
V4CanonicalRequest.getCanonicalRequestString()
DefaultV4RequestSigner.sign(SdkHttpRequest$Builder)
V4RequestSigner.lambda$header$0(V4Properties, SdkHttpRequest$Builder)
DefaultAwsV4HttpSigner.doSign(SignRequest, Checksummer, V4RequestSigner, V4PayloadSigner)
DefaultAwsV4HttpSigner.sign(SignRequest)
AsyncSigningStage.doSraSign(SdkHttpFullRequest, RequestExecutionContext, SelectedAuthScheme, Identity)
AsyncSigningStage.lambda$sraSignRequest$1(SdkHttpFullRequest, RequestExecutionContext, SelectedAuthScheme, Identity)
MetricUtils.reportDuration(Supplier, MetricCollector, SdkMetric)
AsyncSigningStage.lambda$sraSignRequest$3(SdkHttpFullRequest, RequestExecutionContext, SelectedAuthScheme, Identity)
CompletableFuture.uniComposeStage(Executor, Function)
CompletableFuture.thenCompose(Function)
AsyncSigningStage.sraSignRequest(SdkHttpFullRequest, RequestExecutionContext, SelectedAuthScheme)
AsyncSigningStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncSigningStage.execute(Object, RequestExecutionContext)
RequestPipelineBuilder$ComposingRequestPipelineStage.execute(Object, RequestExecutionContext)
AsyncApiCallAttemptMetricCollectionStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncApiCallAttemptMetricCollectionStage.execute(Object, RequestExecutionContext)
AsyncRetryableStage$RetryingExecutor.attemptExecute(CompletableFuture)
AsyncRetryableStage$RetryingExecutor.attemptFirstExecute(CompletableFuture)
AsyncRetryableStage$RetryingExecutor.execute()
AsyncRetryableStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncRetryableStage.execute(Object, RequestExecutionContext)
RequestPipelineBuilder$ComposingRequestPipelineStage.execute(Object, RequestExecutionContext)
AsyncExecutionFailureExceptionReportingStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncExecutionFailureExceptionReportingStage.execute(Object, RequestExecutionContext)
AsyncApiCallTimeoutTrackingStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncApiCallTimeoutTrackingStage.execute(Object, RequestExecutionContext)
AsyncApiCallMetricCollectionStage.execute(SdkHttpFullRequest, RequestExecutionContext)
AsyncApiCallMetricCollectionStage.execute(Object, RequestExecutionContext)
RequestPipelineBuilder$ComposingRequestPipelineStage.execute(Object, RequestExecutionContext)
AmazonAsyncHttpClient$RequestExecutionBuilderImpl.execute(TransformingAsyncResponseHandler)
BaseAsyncClientHandler.invoke(SdkClientConfiguration, SdkHttpFullRequest, AsyncRequestBody, SdkRequest, ExecutionContext, TransformingAsyncResponseHandler)
BaseAsyncClientHandler.doExecute(ClientExecutionParams, ExecutionContext, TransformingAsyncResponseHandler)
BaseAsyncClientHandler.lambda$execute$1(ClientExecutionParams)
BaseAsyncClientHandler.measureApiCallSuccess(ClientExecutionParams, Supplier)
BaseAsyncClientHandler.execute(ClientExecutionParams)
AwsAsyncClientHandler.execute(ClientExecutionParams)
DefaultDynamoDbAsyncClient.lambda$query$203(HttpResponseHandler, HttpResponseHandler, SdkClientConfiguration, MetricCollector, QueryRequest, URI)
CompletableFuture.uniComposeStage(Executor, Function)
CompletableFuture.thenCompose(Function)
DefaultDynamoDbAsyncClient.query(QueryRequest)
AWS Java SDK version used
2.28.6 (also reproduced on 2.29.52, 2.30.0 and 2.30.25)
JDK version used
Temurin OpenJDK 21
Operating System and version
Ubuntu Jammy
@Hexcles can you share a SSCCE code example we can use to reproduce?
Unfortunately I don't have the bandwidth to do that this month. I used all my spare cycles on bisecting to narrow down the regression to a single release (2.29.52..2.30.0) and it's no longer urgent for us as we pin the version to 2.29. Hopefully with the stack trace you can find somewhere to start? Apologies and thank you.
The hint we get from the stacktrace is that the exception is coming from parsing the request canonical URI:
NumberFormatException.<init>(String)
NumberFormatException.forCharSequence(CharSequence, int, int, int)
Integer.parseInt(CharSequence, int, int, int)
URI$Parser.scanByte(int, int)
URI$Parser.scanIPv4Address(int, int, boolean)
URI$Parser.parseIPv4Address(int, int)
URI$Parser.parseServer(int, int, boolean)
URI$Parser.parseAuthority(int, int)
URI$Parser.parseHierarchical(int, int)
URI$Parser.parse(boolean)
URI.<init>(String)
URI.create(String)
SdkHttpRequest.getUri()
V4CanonicalRequest.getCanonicalUri(SdkHttpRequest, V4CanonicalRequest$Options)
V4CanonicalRequest.canonicalUri()
V4CanonicalRequest.getCanonicalRequestString()
...
Are you overriding the request endpoint, or using the default configuration? Can you share a snippet of how you're creating the DynamoDB client?
First of all, apologies that I apparently got the bisecting wrong -- we are down to (2.27.4 .. 2.29.52) instead. I'll see if I can get a better/smaller reproduction to help with my own bisecting, too.
And here's another stack trace with line numbers captured from a debugger in 2.29.52
NumberFormatException.<init>(String):L#54
NumberFormatException.forCharSequence(CharSequence, int, int, int):L#83
Integer.parseInt(CharSequence, int, int, int):L#743
URI$Parser.scanByte(int, int):L#3429
URI$Parser.scanIPv4Address(int, int, boolean):L#3459
URI$Parser.parseIPv4Address(int, int):L#3493
URI$Parser.parseServer(int, int, boolean):L#3392
URI$Parser.parseAuthority(int, int):L#3304
URI$Parser.parseHierarchical(int, int):L#3240
URI$Parser.parse(boolean):L#3196
URI.<init>(String):L#645
URI.create(String):L#930
SdkHttpRequest.getUri():L#157
RequestAdapter.addHeadersToRequest(DefaultHttpRequest, SdkHttpRequest):L#83
RequestAdapter.adapt(SdkHttpRequest):L#56
NettyRequestExecutor.makeRequest():L#236
NettyRequestExecutor.lambda$makeRequestListener$10():L#181
NettyUtils.doInEventLoop(EventExecutor, Runnable):L#248
NettyRequestExecutor.makeRequestListener(Future):L#177
DefaultPromise.notifyListener0(Future, GenericFutureListener):L#590
DefaultPromise.notifyListeners0(DefaultFutureListeners):L#583
DefaultPromise.notifyListenersNow():L#559
DefaultPromise.access$200(DefaultPromise):L#35
DefaultPromise$1.run():L#503
PromiseTask.runTask():L#98
PromiseTask.run():L#106
AbstractEventExecutor.runTask(Runnable):L#173
AbstractEventExecutor.safeExecute(Runnable):L#166
SingleThreadEventExecutor.runAllTasks(long):L#472
EpollEventLoop.run():L#405
SingleThreadEventExecutor$4.run():L#998
ThreadExecutorMap$2.run():L#74
Thread.runWith(Object, Runnable):L#1596
Thread.run():L#1583
Also, I got the error message: Error at index 9 in: "<redcated>" where redacted is actually our account ID. We don't override our dynamodb endpoint and based on our tracing it's dynamodb.us-east-1.amazonaws.com.
Also note that these exceptions are caught by SDK and the requests do succeed in the end
Oh I think it's https://github.com/aws/aws-sdk-java-v2/releases/tag/2.28.6
Generate account endpoint for DynamoDB requests when the account ID is available
And internally within the SDK we got <account-id>.ddb.us-east-1.amazonaws.com. Then Uri.create in JDK does this exception-as-control-flow thing as it first tries to parse the domain as an IP: https://github.com/frohoff/jdk8u-jdk/blob/da0da73ab82ed714dc5be94acd2f0d00fbdfe2e9/src/share/classes/java/net/URI.java#L3335
That's really unfortunate. Is there a way for us to turn off account-based endpoint for DynamoDB?
Hi @Hexcles,
Thanks for sharing the insight. For turning off this feature, you may refer to below section from the announcement - https://github.com/aws/aws-sdk-java-v2/discussions/5628
You may instead disable account-based endpoints for DynamoDB by setting the config flag accountIdEndpointMode to disabled, as a system property or environment variable. We recommend against this because the DynamoDB service uses account-based endpoints to ensure performance and scalability.
Regards, Chaitanya
@Hexcles the Java SDK 2.28.6 should work fine with DDB account-based endpoints, so I'm curious about what could be causing the NumberFormatException.
Are you using aws endpoints directly, or using a third-party solution endpoint? Have you had the chance to work on a repro code?
@debora-ito Let me try to clarify. Everything "works fine"; there's no uncaught exception or any impact on functionalities.
However, there is a non-zero impact on performance due to the sharply increased number of exceptions thrown (and caught) from within java.net.URI due to how the account-based endpoint is structured (it starts with a long sequence of digits) and how the standard JDK implements the parsing (exception as control flow).
To repro, you don't even need the SDK; just run the following code in a debugger and set a breakpoint on any exception thrown (including caught):
import java.net.URI;
public class Main {
public static void main(String[] args) {
var uri = URI.create("98765432101.ddb.us-east-1.amazonaws.com");
System.out.println(uri);
}
}
There's no exception thrown for e.g. dynamodb.us-east-1.amazonaws.com.
Oh I see now, thank you for the clarification.
~~Can you measure the performance impact somehow? Depending on the impact we may want to take a closer look.~~
Update: please disregard my strikethrough comment, we'll take a look. Thank you for raising this.
Hi @Hexcles,
Thanks for reporting this issue. I have spent a good amount of time benchmarking (using JMH) the actual impact caused by the account ID endpoint and the results were interesting.
When testing URI.create() directly, I can indeed see a degradation in overall performance in terms of %, but the overall impact is still negligible when we are looking at the actual time. The benchmark results show that creating URIs with account IDs is significantly slower than standard URIs, with account ID URIs taking about 3-4 times longer on average (3.218 μs vs 0.830 μs) however translating these units to milliseconds you will have to make 1000 requests per/second to experience an addition of 1ms to the overall execution.
Benchmark Mode Cnt Score Error Units
uriCreateAccountIdEndpoint thrpt 10 0.311 ± 0.008 ops/us
uriCreateStandardEndpoint thrpt 10 1.222 ± 0.004 ops/us
uriCreateAccountIdEndpoint avgt 10 3.218 ± 0.115 us/op
uriCreateStandardEndpoint avgt 10 0.830 ± 0.001 us/op
uriCreateAccountIdEndpoint sample 1906430 3.315 ± 0.009 us/op
uriCreateAccountIdEndpoint:p0.00 sample 3.124 us/op
uriCreateAccountIdEndpoint:p0.50 sample 3.268 us/op
uriCreateAccountIdEndpoint:p0.90 sample 3.328 us/op
uriCreateAccountIdEndpoint:p0.95 sample 3.360 us/op
uriCreateAccountIdEndpoint:p0.99 sample 3.572 us/op
uriCreateAccountIdEndpoint:p0.999 sample 14.656 us/op
uriCreateAccountIdEndpoint:p0.9999 sample 21.579 us/op
uriCreateAccountIdEndpoint:p1.00 sample 1865.728 us/op
uriCreateStandardEndpoint sample 3772995 0.882 ± 0.003 us/op
uriCreateStandardEndpoint:p0.00 sample 0.764 us/op
uriCreateStandardEndpoint:p0.50 sample 0.869 us/op
uriCreateStandardEndpoint:p0.90 sample 0.918 us/op
uriCreateStandardEndpoint:p0.95 sample 0.933 us/op
uriCreateStandardEndpoint:p0.99 sample 0.960 us/op
uriCreateStandardEndpoint:p0.999 sample 2.120 us/op
uriCreateStandardEndpoint:p0.9999 sample 16.160 us/op
uriCreateStandardEndpoint:p1.00 sample 2158.592 us/op
uriCreateAccountIdEndpoint ss 10 27.228 ± 1.305 us/op
uriCreateStandardEndpoint ss 10 17.525 ± 3.704 us/op
(Using nano seconds here please forgive the change in unit of measurment) When testing the a stubbed SDK client (not measuring roundtripping time, and only focusing on building the request and resolving the endpoint) the results are negligible in terms of % ( p90 is 0.1% slower and p99 is 0.3% slower).
Benchmark Mode Cnt Score Error Units
Foo.withAccountEndpoints thrpt 100 ≈ 10⁻⁸ ops/ns
Foo.withoutAccountEndpoints thrpt 100 ≈ 10⁻⁸ ops/ns
Foo.withAccountEndpoints avgt 100 55598535.201 ± 208678.153 ns/op
Foo.withoutAccountEndpoints avgt 100 56713598.175 ± 537362.920 ns/op
Foo.withAccountEndpoints sample 1770 58772499.670 ± 248637.587 ns/op
Foo.withAccountEndpoints:p0.00 sample 51707904.000 ns/op
Foo.withAccountEndpoints:p0.50 sample 59179008.000 ns/op
Foo.withAccountEndpoints:p0.90 sample 62193664.000 ns/op
Foo.withAccountEndpoints:p0.95 sample 62783488.000 ns/op
Foo.withAccountEndpoints:p0.99 sample 65077248.000 ns/op
Foo.withAccountEndpoints:p0.999 sample 74617978.880 ns/op
Foo.withAccountEndpoints:p0.9999 sample 75628544.000 ns/op
Foo.withAccountEndpoints:p1.00 sample 75628544.000 ns/op
Foo.withoutAccountEndpoints sample 1776 58508075.820 ± 255582.120 ns/op
Foo.withoutAccountEndpoints:p0.00 sample 51052544.000 ns/op
Foo.withoutAccountEndpoints:p0.50 sample 59146240.000 ns/op
Foo.withoutAccountEndpoints:p0.90 sample 62128128.000 ns/op
Foo.withoutAccountEndpoints:p0.95 sample 62717952.000 ns/op
Foo.withoutAccountEndpoints:p0.99 sample 64880640.000 ns/op
Foo.withoutAccountEndpoints:p0.999 sample 70123520.000 ns/op
Foo.withoutAccountEndpoints:p0.9999 sample 70123520.000 ns/op
Foo.withoutAccountEndpoints:p1.00 sample 70123520.000 ns/op
Foo.withAccountEndpoints ss 100 60984126.670 ± 1522649.960 ns/op
Foo.withoutAccountEndpoints ss 100 60812634.540 ± 1114374.923 ns/op
With regards to GC pressure, I used both JMH profiler and Java Flight Recorder to try and profile GC and I'm not seeing any difference in GC between the two scenarios.
Are you able to provide us with the profiling setup you have to measure the excessive GC? I'm having a hard time reproducing it on my end.
Thanks, Ran~
@RanVaknin thanks for your detailed analysis! I agree with your results.
I did some digging, too, and couldn't come up with a standalone repro that can show meaningful performance difference, either. My current best guess is it has something to do with our APM setup. We have a relatively high sample rate for exceptions (including caught ones) in our java agent, which would capture the exception, attach the full stacktrace and error message to the current span (it's a big JSON), and create a profiling event. We usually have <10 exceptions per second, but this pushed our exception rate to >>100 per second (aside: this made many of our internal exception metrics useless/noisy, too -- we otherwise have a good exception hygiene and only use them for truly exceptional cases, which is also why we can get away with a high exception sample rate). And we can't easily exclude this exception as the skip list is by type, and NumberFormatException is a built-in exception that our business code can also throw.
I guess one thing that's not clear to me is the advantage of using this account-specific endpoint.
The JDK API URI.create can be expensive for some specific cases, this one is caused by the NumberFormatException, I fixed another one caused by URISyntaxException recently. Even w/o the exception URI.create is expensive(We found this in a project helping internal team to optimize service performance) because by default DefaultV4RequestSigner parses URI using URI.create for every single request to normalize URI path, I think URI path normalization can be disabled for DDB client to avoid the expensive URI parsing, just like this PR did for S3.
Hi @Hexcles,
Thanks for the follow up. I definitely agree that the increase in exception caught and thrown is an issue. I have contacted the Corretto team on Amazon to see if they can help with driving an improvement in this JDK's implementation of URI.create().
I guess one thing that's not clear to me is the advantage of using this account-specific endpoint.
Unfortunately I was not able to find any public facing announcement from the Dynamodb team, and do not want to share incorrect information so I can't really speak to the why right now. I can only refer you to the official documentation but it doesn't provide the reasoning. Once there is an official public wording about it I will definitely share it here.
@pengxiaolong Thanks for the additional context. I'll make sure to share it with the Corretto team. I see there's an existing PR. I hope the JDK team can also prioritize this work to help out.
Since we rely on the JDK's URI functionality, this is not actionable by the Java SDK team at the moment. Additionally, because this feature is opt-in, I'm going to downgrade the severity. I'll make sure to keep you posted with any new developments.
Thanks, Ran~
Thanks for the detailed response, Ran! Though I should note that this feature (using account specific endpoint) is opt-out (on by default) not opt-in.
Sent from my cellphone.
On Sun, Mar 30, 2025, 17:04 Ran Vaknin @.***> wrote:
Hi @Hexcles https://github.com/Hexcles,
Thanks for the follow up. I definitely agree that the increase in exception caught and thrown is an issue. I have contacted the Corretto team on Amazon to see if they can help with driving an improvement in this JDK's implementation of URI.create().
I guess one thing that's not clear to me is the advantage of using this account-specific endpoint.
Unfortunately I was not able to find any public facing announcement from the Dynamodb team, and do not want to share incorrect information so I can't really speak to the why right now. I can only refer you to the official documentation but it doesn't provide the reasoning. Once there is an official public wording about it I will definitely share it here.
@pengxiaolong https://github.com/pengxiaolong Thanks for the additional context. I'll make sure to share it with the Corretto team. I see there's an existing PR. I hope the JDK team can also prioritize this work to help out.
Since we rely on the JDK's URI functionality, this is not currently actionable by the Java SDK team at the moment. Additionally, because this feature is opt-in, I'm going to downgrade the severity. I'll make sure to keep you posted with any new developments.
Thanks, Ran~
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java-v2/issues/5933#issuecomment-2764816909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6ZDFUNCWUTRJ7FTSWG432XCA7DAVCNFSM6AAAAABYNJSDH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUHAYTMOJQHE . You are receiving this because you were mentioned.Message ID: @.***> [image: RanVaknin]RanVaknin left a comment (aws/aws-sdk-java-v2#5933) https://github.com/aws/aws-sdk-java-v2/issues/5933#issuecomment-2764816909
Hi @Hexcles https://github.com/Hexcles,
Thanks for the follow up. I definitely agree that the increase in exception caught and thrown is an issue. I have contacted the Corretto team on Amazon to see if they can help with driving an improvement in this JDK's implementation of URI.create().
I guess one thing that's not clear to me is the advantage of using this account-specific endpoint.
Unfortunately I was not able to find any public facing announcement from the Dynamodb team, and do not want to share incorrect information so I can't really speak to the why right now. I can only refer you to the official documentation but it doesn't provide the reasoning. Once there is an official public wording about it I will definitely share it here.
@pengxiaolong https://github.com/pengxiaolong Thanks for the additional context. I'll make sure to share it with the Corretto team. I see there's an existing PR. I hope the JDK team can also prioritize this work to help out.
Since we rely on the JDK's URI functionality, this is not currently actionable by the Java SDK team at the moment. Additionally, because this feature is opt-in, I'm going to downgrade the severity. I'll make sure to keep you posted with any new developments.
Thanks, Ran~
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java-v2/issues/5933#issuecomment-2764816909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6ZDFUNCWUTRJ7FTSWG432XCA7DAVCNFSM6AAAAABYNJSDH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUHAYTMOJQHE . You are receiving this because you were mentioned.Message ID: @.***>
@Hexcles ,
You are right. I meant to say opt-out.
Thanks, Ran~
Hi @Hexcles ,
Should be resolved with https://github.com/aws/aws-sdk-java-v2/pull/6244
Can you check and let us know?
Thanks, Ran~
It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.
Thanks! I can confirm that the fix works for us.
Sent from my cellphone.
On Mon, Oct 6, 2025, 14:06 github-actions[bot] @.***> wrote:
Closed #5933 https://github.com/aws/aws-sdk-java-v2/issues/5933 as completed.
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java-v2/issues/5933#event-20126689581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6ZDFU2L3JV57UB67Z3FD3WLKVBAVCNFSM6AAAAABYNJSDH6VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMRQGEZDMNRYHE2TQMI . You are receiving this because you were mentioned.Message ID: @.***>