Binary size too big with s3 packages
Acknowledgements
- [X] I have searched (https://github.com/aws/aws-sdk/issues?q=is%3Aissue) for past instances of this issue
- [X] I have verified all of my SDK modules are up-to-date (you can perform a bulk update with
go get -u github.com/aws/aws-sdk-go-v2/...)
Describe the bug
When I add the s3 backend which uses the SDKv2 to rclone it increases the binary by 7.4 MiB and users have been complaining about the large binary sizes.
This makes the s3 backend by far the largest contributor to the rclone binary size (about 13%). Here are the top 11 backends ranked by size measured by commenting the backend out and compiling in release mode to check the binary sizes.
| Backend | Size MiB |
|---|---|
| s3 | 7.43 |
| storj | 4.24 |
| protondrive | 2.61 |
| hdfs | 2.41 |
| oracleobjectstorage | 1.60 |
| drive | 1.53 |
| filescom | 1.46 |
| azureblob | 1.26 |
| dropbox | 1.15 |
| googlecloudstorage | 0.78 |
| azurefiles | 0.65 |
I compared this to the last version of rclone which used the SDKv1 and the s3 backend takes 6.9 MiB so this has got slightly worse. I expected this to get better with the modularization of the SDK :-(
Rclone uses the following sdk imports
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/aws/signer/v4"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/aws-sdk-go-v2/service/s3/types"
"github.com/aws/smithy-go"
"github.com/aws/smithy-go/logging"
"github.com/aws/smithy-go/middleware"
"github.com/aws/smithy-go/transport/http"
According to the rather neat tool go-size-analyzer
The bulk of this in the service modules
Is there any way this can be improved?
Thank you
Regression Issue
- [ ] Select this option if this issue appears to be a regression.
Expected Behavior
I expected the S3 packages from the SDK to add no more than 1MB to my binary
Current Behavior
It adds 7.5 MB to my binary
Reproduction Steps
The test_backend_sizes.py script will compile rclone many times to measure backend sizes, and the go-size-analyzer tool can be used to verify it by running it on the rclone binary.
Possible Solution
No response
Additional Information/Context
No response
AWS Go SDK V2 Module Versions Used
go mod graph as attachment as it is quite big.
Compiler and Version used
go version go1.23.4 linux/amd64
Operating System and version
Ubuntu 22.04.5 LTS
Hi @ncw,
Thanks for reaching out. S3 is indeed a large client because it is one of the biggest AWS services in terms of its API representation size. We can do more about trying to reduce the binary size but it will likely happen as a "side effect" of other efforts, for example https://github.com/aws/aws-sdk-go-v2/issues/2933.
I'm going to keep this is a backlog item to gauge community interest.
Thanks, Ran~
Hey just +1'ing this. Came here after trying to reduce a binary's size, found that s3 is the biggest dependency by a bit:
I've got a gRPC server with OTLP and a bunch of other stuff, I was expecting some other library to come out on top, but the s3 package in particular is top in size by a fair bit at ~12% of total size
The problem is with all services: S3 (4.1 MB), Route53 (2.7 MB), Lighsail (4.8 MB), etc.
An important root cause of the problem is the Client structure: the Client structure has methods, and each of those methods depends on other structures, related to operations.
For the integrity of what a structure is, the compiler is forced to keep methods, and so the related structures, methods, and functions.
A way to drastically reduce the size is to remove the Client methods related to typed operations and replace them with functions that take the lightweight client as a parameter.
I understand that this is not a beautiful solution, and if you are using all the methods, it results in the same size; however, this is the "easiest" way, and I think nobody uses all the hundreds of methods for a client.
As the code is generated, maybe it's possible to hack it and apply this pattern. I don't know how exactly smithy works, I need to investigate, but I can see a possibility to create automatic forks with lightweight clients.
I have the feeling that modifying this block can be a first step:
https://github.com/aws/smithy-go/blob/9aa9f7326a20363bbdf6c0cc70758f5b72136e50/codegen/smithy-go-codegen/src/main/java/software/amazon/smithy/go/codegen/OperationGenerator.java#L107-L120
from this:
writer.openBlock("func (c $P) $T(ctx $T, params $P, optFns ...func(*Options)) ($P, error) {", "}",
...
writer.write("result, metadata, err := c.invokeOperation(ctx, $S, params, optFns, c.$L)",
to this:
writer.openBlock("func $T(ctx $T, c $P params $P, optFns ...func(*Options)) ($P, error) {", "}",
...
writer.write("result, metadata, err := c.invokeOperation(ctx, $S, params, optFns, $L)",
It's possible to think the problem differently by hiding the need of the client by creating operations as structures, instead of methods or functions, with a method invoke(c *Client).
But this requires a deeper change of the templates used to generate the service code.
That would be a colossal API break so we can't do that really.
I never say the opposite; that's why I'm talking about creating a fork. I was sure that creating a v3 was not a solution for you.
But my explanation of the problem is still right.
I tried to apply my hack.
Technically, it works, but the problem is the interaction with the elements that use interfaces based on methods of the client.
This is a problem when we need to interact with those elements like AssumeRoleProvider.
So, this is not a viable option for a low-maintenance fork.
I thought of ways to be non-breaking that can be applied inside this repository (no fork), and I found 2 possible solutions. There may be other ways, but those 2 approaches seem already viable.
I don't have the full overview of the API clients, so some elements could require adjustments.
My preference is for the "lightweight client" because it produces a more significant reduction in the binary size and requires fewer changes within the current code.
Solution 1: lightweight client
Neutralize the dependencies on Client
- Use an interface to abstract
Clienttype BaseClient interface { TimeOffset() *atomic.Int64 Options() Options } - Add method
TimeOffset()toClient - Replace usage of
*Clientwith the interfacefinalizeOperationRetryMaxAttemptsaddTimeOffsetBuildinitializeTimeOffsetResolver-> remove it and initialize the timeout inside the constructorNewClient
- Modify
stackFnsignature tofunc(BaseClient, *middleware.Stack, Options) - Apply new
stackFnsignature where it's necessary- Convert
addOperationXXXunexposed methods into functions that usec BaseClientas an argument.
- Convert
Create a new lightweight client
- Create a lightweight client
RawClientwith no "typed" methods (operation and related methods), only 3 methods:TimeOffset()Options()InvokeOperation()
- Add constructor(s) (
NewFromConfig,New, etc.) - Expose
invokeOperationfromRawClient - Expose
addOperationXXXfunctions
Notes
To use it, the API client users have to create their method implementations based on the different exposed elements and interfaces.
This is very effective in terms of binary size reduction and has minimal changes to the existing code.
Solution 2: Lazy Methods
Neutralize the dependencies on Client
- Use an interface to abstract
Clienttype BaseClient interface { TimeOffset() *atomic.Int64 Options() Options } - Add method
TimeOffset()toClient - Replace usage of
*Clientwith the interfacefinalizeOperationRetryMaxAttemptsaddTimeOffsetBuildinitializeTimeOffsetResolver-> remove it and initialize the timeout inside the constructorNewClient
- Modify
stackFnsignature tofunc(BaseClient, *middleware.Stack, Options) - Apply new
stackFnsignature where it's necessary- Convert
addOperationXXXunexposed methods into functions that usec BaseClientas an argument.
- Convert
Lazy
-
Create a "factory store" inside
Clientto registeraddXXXfunctionstype Client struct { // ... factory sync.Map // ... } -
Create one register function (not method) per exposed method.
func RegisterAddXXX(c *Client) { c.factory.Store("AddXXX", addOperationXXX) } -
Use the "factory store" inside exposed methods.
func (c *Client) ListXXX(ctx context.Context, params *ListXXXInput, optFns ...func(*Options)) (*ListXXXOutput, error) { rawFunc, ok := c.factory.Load("ListXXX") if !ok { return nil, errors.New("operation ListXXX not found") } fn, ok := rawFunc.(func(ctx context.Context, c BaseClient, params *ListXXXInput, optFns ...func(*Options)) (*ListXXXOutput, error)) if !ok { return nil, errors.New("invalid type for ListXXX operation") } return fn(ctx, c, params, optFns...) } -
The existing interfaces related to
Clientshould embedBaseClientinterface if necessary -
Create a "lazy" client constructor with an empty map
-
Inside the "old"
NewClientset up the "factory" with all functions.
Notes
This solution is less effective in binary size reduction, because the structure ListXXXInput and ListXXXOutput are still attached to the client, but the other structures (serialization/deserialization) are loaded "on-demand".
It feels less user-friendly because a user of the new constructors needs to "initialize" the "factory".