Initializing GeneratedServiceMetadataProvider takes a while
Describe the feature
When creating a service client (e.g. DynamoDbAsyncClient) there is an expensive one time call to create GeneratedServiceMetadataProvider which incurs the penalty of an expensive static initializer. The static initializer creates a map object with 299 entries (currently), each of which also initializes an instance that extends ServiceMetadata which also all have large static initialization overhead.
From the looks of it, every single instance of ServiceMetadata is initialized but in my use case, only DynamodbServiceMetadata was needed. And this call was only needed to acquire the client endpoint.
I profiled just the cold start initialization of GeneratedServiceMetadataProvider to confirm my suspicions and get a general idea of the overhead incurred. On my machine the operation took between 350 and 500 ms. I think this could be improved significantly by implementing lazy loading of ServiceMetadata objects.
Benchmark
@State(Scope.Benchmark)
@Warmup(iterations = 0)
@Measurement(iterations = 1)
@Fork(20)
public class GeneratedProviderBenchmark {
@Benchmark
@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void createObject() {
GeneratedServiceMetadataProvider generated = new GeneratedServiceMetadataProvider();
ServiceMetadata metadata = generated.serviceMetadata("xray");
}
public static void main(String... args) throws RunnerException, CommandLineOptionException {
Options opt = new OptionsBuilder()
.parent(new CommandLineOptions())
.include(GeneratedProviderBenchmark.class.getSimpleName())
.addProfiler(StackProfiler.class)
.build();
Collection<RunResult> run = new Runner(opt).run();
}
}
Results:
| Benchmark | Mode | Cnt | Score | Error | Units |
|---|---|---|---|---|---|
| GeneratedProviderBenchmark.createObject | ss | 20 | 364.140 | ± 16.565 | ms/op |
Use Case
The goal is to reduce the overhead as much as possible and make cold start times as performant as they can be. This is an ongoing process that a number of other tickets are related to. In fact the ticket that lead to the creation of the GeneratedServiceMetadataProvider class was done so in an effort to reduce cold start times of the DynamoDbClient.
Proposed Solution
Lazily initialize the GeneratedServiceMetadataProvider using a static factory method. I created a proof of concept to test a possible improvement to the object initialization.
public final class GeneratedServiceMetadataProvider implements ServiceMetadataProvider {
private static final Map<String, ServiceMetadata> SERVICE_METADATA = new HashMap<>();
private static ServiceMetadata createServiceMetadata(String endpointPrefix) {
switch (endpointPrefix) {
case "a4b":
return new A4bServiceMetadata();
... // other cases
default:
throw new IllegalStateException("Unexpected value: " + endpointPrefix);
}
}
private static ServiceMetadata getServiceMetadata(String endpointPrefix) {
if (!SERVICE_METADATA.containsKey(endpointPrefix)) {
SERVICE_METADATA.put(endpointPrefix, createServiceMetadata(endpointPrefix));
}
return SERVICE_METADATA.get(endpointPrefix);
}
@Override
public ServiceMetadata serviceMetadata(String endpointPrefix) {
return getServiceMetadata(endpointPrefix);
}
}
This produced the following results using the same benchmark as above:
| Benchmark | Mode | Cnt | Score | Error | Units |
|---|---|---|---|---|---|
| GeneratedProviderBenchmark.createObject | ss | 20 | 14.140 | ± 0.465 | ms/op |
About a 25x improvement in object creation speed, not to mention the memory overhead thats saved by deferring initialization of all the other ServiceMetadata instances that aren't needed.
Other Information
As an aside, I was surprised that providing an EndpointProvider to the client builder - one which doesn't go through GeneratedServiceMetadataProvider - wasn't used at all when setting the endpoint of the client on creation. Why wouldn't the client builder use the endpoint provider, if present, if it's going to use the provider for every request anyway?
Related issues: https://github.com/aws/aws-sdk-java-v2/issues/748 https://github.com/aws/aws-sdk-java-v2/issues/6
Acknowledgements
- [X] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
AWS Java SDK version used
2
JDK version used
1.8
Operating System and version
macOS Monterey 12.6.1
Hi @awmcc90 thank you for the thorough report! We recently changed to a use EndpointProvider for endpoint resolution in place of generated service metadata classes, and I think it makes sense to lazily initialize it. We will take a look at your PR shortly. Thanks again for the detailed information.
As to the endpoint provider issue, it seems like a bug, could you create a new issue for it?
Thanks for your reply. I'm not sure I'm the right person to make the ticket about the EndpointProvider because I'm not sure how it's supposed to work. The only use of EndpointProvider that I can find are in various ExecutionInterceptor classes. Are you saying that GeneratedServiceMetadataProvider should never be called given there is always a default EndpointProvider?
Also, while looking into this I noticed that some of the codegen results in dead code. One example of that is this line in AsyncClientClass. That line needs to be inside the if statement on line 293 or else a variable is declared that is never used. In DefaultDynamoDbAsyncClient there are 38 instances of this unused variable declared. Should I create a ticket for that too?
Are you saying that GeneratedServiceMetadataProvider should never be called given there is always a default EndpointProvider?
Yeah, we are planning to make the change to remove the usage ofGeneratedServiceMetadataProvider from the request path.
That line needs to be inside the if statement on line 293 or else a variable is declared that is never used. In DefaultDynamoDbAsyncClient there are 38 instances of this unused variable declared. Should I create a ticket for that too?
Good catch, could you create a ticket for it?
Beyond being slow, this also allocates ~10MB based on a test with async-profiler. If none of this data is ever retrieved, then this is very wasteful, creating a spike of memory allocation.
Any update on this? I made a PR but that's way out of sync at this point.
Second this. Please fix the slow as hell startup of AWS Java SDKs