aws-sdk-kotlin
aws-sdk-kotlin copied to clipboard
Investigate ways to reduce final jar size
The dynamodb jar size is ~5MB, which is approximately twice as big as aws-sdk-java-v2.
The task here is to investigate ways to reduce the final size. I did some preliminary investigation and there is definitely some low hanging fruit.
Investigation Results
Below are quick investigations into dynamodb class files and some opportunities identified to reduce the overall size
tl;dr Some things contributing to our overall size
- compiler generated state machines to support suspend at the deserializer level (which we don't even use)
- backing classes for lambdas in operation middleware (and probably elsewhere)
- Model classes still need investigated, some seem large for what they contain (e.g.
CreateTableRequest.class
is 10kb)
Use of suspend
in deserializers
> ls CreateTableOperationDeserializerKt*
CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2.class
CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1.class
CreateTableOperationDeserializerKt$throwCreateTableError$1.class
CreateTableOperationDeserializerKt.class
> javap CreateTableOperationDeserializerKt*
Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2 extends kotlin.coroutines.jvm.internal.SuspendLambda implements kotlin.jvm.functions.Function2<aws.smithy.kotlin.runtime.serde.Deserializer$FieldIterator, kotlin.coroutines.Continuation<? super kotlin.Unit>, java.lang.Object> {
java.lang.Object L$1;
int label;
final aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor $TABLEDESCRIPTION_DESCRIPTOR;
final aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder $builder;
final aws.smithy.kotlin.runtime.serde.json.JsonDeserializer $deserializer;
aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2(aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor, aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder, aws.smithy.kotlin.runtime.serde.json.JsonDeserializer, kotlin.coroutines.Continuation<? super aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2>);
public final java.lang.Object invokeSuspend(java.lang.Object);
public final kotlin.coroutines.Continuation<kotlin.Unit> create(java.lang.Object, kotlin.coroutines.Continuation<?>);
public final java.lang.Object invoke(aws.smithy.kotlin.runtime.serde.Deserializer$FieldIterator, kotlin.coroutines.Continuation<? super kotlin.Unit>);
public java.lang.Object invoke(java.lang.Object, java.lang.Object);
}
Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.smithy.kotlin.runtime.serde.SdkObjectDescriptor$DslBuilder, kotlin.Unit> {
final aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor $TABLEDESCRIPTION_DESCRIPTOR;
aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1(aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor);
public final void invoke(aws.smithy.kotlin.runtime.serde.SdkObjectDescriptor$DslBuilder);
public java.lang.Object invoke(java.lang.Object);
}
Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1 extends kotlin.coroutines.jvm.internal.ContinuationImpl {
java.lang.Object L$0;
java.lang.Object L$1;
java.lang.Object result;
int label;
aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1(kotlin.coroutines.Continuation<? super aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1>);
public final java.lang.Object invokeSuspend(java.lang.Object);
}
Compiled from "CreateTableOperationDeserializer.kt"
public final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt {
public static final java.lang.Object access$throwCreateTableError(aws.smithy.kotlin.runtime.client.ExecutionContext, aws.smithy.kotlin.runtime.http.response.HttpResponse, kotlin.coroutines.Continuation);
public static final java.lang.Object access$deserializeCreateTableOperationBody(aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder, byte[], kotlin.coroutines.Continuation);
}
Looking at what's generated for one operation deserializer. There is overhead due to backing state machines for implementing deserialization as suspend. Doesn't explain the model package size since those have no suspend functionality.
A quick test to remove suspend from our deserializer interface and codegen:
10064 -rw-r--r-- 1 todaaron staff 4.9M Oct 27 14:52 dynamodb-0.8.1-SNAPSHOT.jar
10248 -rw-r--r-- 1 todaaron staff 4.6M Nov 9 09:38 dynamodb-0.9.2-SNAPSHOT.jar
I wasn't able to remove all uses of it so could be even smaller potentially. Every suspend function is generating a backing class for the state machine. We don't actually use suspend in our deserializer. It was added with the hope that we could literally process bytes off the wire as they come but the work on replacing gson showed that using suspend at the tokenizer level resulted in terrible performance. I doubt we'll ever implement deserialization using suspend outside of a custom use case since 99% of documents that come back are going to be small enough that reading it into memory and deserializing it directly from ByteArray is the right move (and larger payloads should be paginated anyway).
If we needed to support incremental parsing/tokenization we would adapt the tokenizer to deal with working off chunks rather than use suspend
. The tokenizer could return e.g. a sealed class that indicates a token or incomplete and needing more data.
Operation Middleware
Operation middleware classes (uncompressed) add up to 1.1M. Default client is 700K. Wondering if we can't find a different way to generate this
internal fun registerCreateTableMiddleware(config: DynamoDbClient.Config, op: SdkHttpOperation<CreateTableRequest,CreateTableResponse>) {
op.apply {
install(ResolveAwsEndpoint) {
serviceId = ServiceId
resolver = config.endpointResolver
}
install(RetryFeature) {
strategy = config.retryStrategy
policy = AwsDefaultRetryPolicy
}
install(AwsJsonProtocol) {
serviceShapeName = "DynamoDB_20120810"
version = "1.0"
}
install(UserAgent) {
staticMetadata = awsUserAgentMetadata
}
install(AwsSigV4SigningMiddleware) {
this.credentialsProvider = config.credentialsProvider
this.signingService = "dynamodb"
}
}
}
> javap OperationMiddlewareKt\$registerCreateTableMiddleware*
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$1 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.http.middleware.ResolveAwsEndpoint$Config, kotlin.Unit> {
final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$1(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
public final void invoke(aws.sdk.kotlin.runtime.http.middleware.ResolveAwsEndpoint$Config);
public java.lang.Object invoke(java.lang.Object);
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$2 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.smithy.kotlin.runtime.http.middleware.Retry$Config, kotlin.Unit> {
final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$2(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
public final void invoke(aws.smithy.kotlin.runtime.http.middleware.Retry$Config);
public java.lang.Object invoke(java.lang.Object);
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.protocol.json.AwsJsonProtocol$Config, kotlin.Unit> {
public static final aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3 INSTANCE;
aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3();
public final void invoke(aws.sdk.kotlin.runtime.protocol.json.AwsJsonProtocol$Config);
public java.lang.Object invoke(java.lang.Object);
static {};
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.http.middleware.UserAgent$Config, kotlin.Unit> {
public static final aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4 INSTANCE;
aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4();
public final void invoke(aws.sdk.kotlin.runtime.http.middleware.UserAgent$Config);
public java.lang.Object invoke(java.lang.Object);
static {};
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$5 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.auth.signing.AwsSigV4SigningMiddleware$Config, kotlin.Unit> {
final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$5(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
public final void invoke(aws.sdk.kotlin.runtime.auth.signing.AwsSigV4SigningMiddleware$Config);
public java.lang.Object invoke(java.lang.Object);
}
Looks like it's creating class for all of the lambda install methods. We could easily get rid of all of this. I've been thinking about this anyway. We want middleware to be per/operation still but we could also have a class of middleware that exists per/client and is only created once. Most middleware (in fact maybe all of them) don't retain any state. Thus we could have a single instance per client rather than allocating per operation.
We could also revisit the whole Feature
interface. Most of the time it's not necessary and we could implement Middleware
directly. This would cut down on the number of backing lambda classes that are generated behind the scenes.
as an update here we did end up refactoring the middleware: https://github.com/awslabs/smithy-kotlin/pull/536
Looks like we are currently down to ~3.1MB.
The current Java V2 SDK sits at around ~2.2MB.
A few other areas to look into:
- There are some areas in generated serde that contribute to unnecessary size (e.g. use of this)
- The way we generate and throw operation errors contributes to the overall size. We could probably just generate a single
throwServiceError
function rather than per/operation functions (we actually had something similar at one point where all errors were registered in a single place). At least for AWS services since they rely on anerrorCode
for matching.
I'll leave this open for tracking and +1's but the low hanging fruit is probably gone at this point.