botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Support SDK modularization on a service basis

Open ciarancourtney opened this issue 5 years ago • 13 comments

A common use case may be to use some boto functions in a client app frozen with pyinstaller, in this case botocore will add 30MB to the install size.

Is there any plan to document how to trim the install size? For example:

  • Specify which services you are going to be using with setuptools install_extras
  • Archive old service definitions outside pypi package

ciarancourtney avatar Aug 31 '18 14:08 ciarancourtney

As of right now we don't have any official guidance on slimming down the package size to remove unused services.

We wouldn't be able to do the second one as it's possible to specify that you want to use older API versions.

This is effectively a feature request to be able to modularize the the Python SDK in a similar fashion as some of the other SDKs (Java, ruby, etc). Marking as a feature request.

joguSD avatar Aug 31 '18 21:08 joguSD

Any updates? With aws lambda unpacked limit being 250 mb, the 42mb of botocore are really a lot

AntonOellerer avatar Nov 15 '19 15:11 AntonOellerer

AWS JavaScript SDK V3 had pulled this off. It would be great to have the same for Boto3 which soon will be unusable for AWS Lambdas.

michaelbrewer avatar Mar 22 '21 19:03 michaelbrewer

For those looking for a short-term solution.

A customer wrote an article demonstrating how to selectively discard services you are sure you don't use

https://blog.cubieserver.de/2020/building-a-minimal-boto3-lambda-layer/

heitorlessa avatar Mar 22 '21 20:03 heitorlessa

Any kind of hint or indication that when this would be worked on?

1.3M	boto3
 55M	botocore
696K	dateutil
148K	jmespath
552K	s3transfer
 36K	six.py
824K	urllib3

michaelbrewer avatar Apr 21 '21 02:04 michaelbrewer

Following up as we're now at 64M - I understand this would be a huge undertaking considering how these are created today, so I'm primarily interested in hearing whether the team is considering a modularization in the future.

heitorlessa avatar Jun 18 '21 09:06 heitorlessa

+1

mccauleyp avatar Nov 23 '21 16:11 mccauleyp

Copying my comment from https://github.com/boto/botocore/issues/2842#issuecomment-1367484168

On further inspection it looks like about 70MB out of that 72.5MB is just the data/ directory. :exploding_head:

I'm sure there's options here. One could be to split each service into an individual package (e.g. botocore-a-la-carte-s3, etc...). > botocore-a-la-carte would contain core code with an extra per service (e.g. botocore-a-la-carte[s3, cloudfront]) and to > maintain backwards-compatibility botocore would simply be botocore-a-la-carte[all].

thejcannon avatar Dec 29 '22 19:12 thejcannon

From https://github.com/thejcannon/botocore-a-la-carte I've started publishing botocore-a-la-carte with an additional package per service provided as an extra on the main package.

E.g. botocore-a-la-carte just has the Python code and core resources. botocore-a-la-carte[s3] also install the S3 data, etc...

thejcannon avatar Jan 04 '23 21:01 thejcannon

What about different versions of API, are the old versions are still needed?

takeda avatar Aug 23 '23 20:08 takeda

$ du -hs *

1,2M    boto3
48K     boto3-1.28.44.dist-info
83M     botocore
220K    botocore-1.31.44.dist-info

83m! where in (not so unusual) venv all ~30 libs takes 133M (so 63% is botocore) with only cryptography 41.0.3 even close to botocore at 14M.

This is so so so much, alpine linux is whole system with required disk size of 130M.

And looking how the size is evolving we will see 100M in 2024 probably :disappointed: :cry:

rafsaf avatar Sep 11 '23 21:09 rafsaf

I used this myself:

          function join_elements {
            local prefix="-path */" separator=" -o "
            local prefixed=("${@/#/$prefix}")
            local rest=("${prefixed[@]:1}")
            local separated=$(printf "%s" "${prefixed[0]}${rest[@]/#/$separator}")
            echo "${separated[@]}"
          }
          function remove_but_latest {
            local latest=$(ls -1 "botocore/data/$1" | sort -r | head -1)
            find "botocore/data/$1" -mindepth 1 -maxdepth 1 -type d -not -path "*/${latest}" -exec rm -vr '{}' +
          }
          function keep_components {
            local component

            find_params=$(join_elements "$@")
            set -o noglob
            find botocore/data -mindepth 1 -maxdepth 1 -type d -not \( $find_params \) -prune -exec rm -vr '{}' +
            set +o noglob
            for component in "$@"; do remove_but_latest "$component"; done
          }
          keep_components cloudformation dynamodb ec2 elbv2 ssm sso sts

It seems to work, and helped me reduce docker image of my application to 83MB, not sure if it can cause issues.

takeda avatar Sep 11 '23 22:09 takeda

Thanks! that's useful overview.

I ended up with dead simple cp + rm only solution. I'm only using s3 and it seems that everything (or most of things) which ends with *json is also used by the common code.

RUN mkdir /tmp/data \
    && cp -r /usr/local/lib/python3.11/site-packages/botocore/data/s3 /tmp/data/ \ 
    && cp -f /usr/local/lib/python3.11/site-packages/botocore/data/*.json /tmp/data \
    && rm -rf /usr/local/lib/python3.11/site-packages/botocore/data \
    && cp -r /tmp/data /usr/local/lib/python3.11/site-packages/botocore/ \
    && rm -rf /tmp/data

Seems to work just fine, probably until it doesn't. So having official support would be a nice thing.

PS. above code 83M -> 5M drop.

rafsaf avatar Sep 11 '23 23:09 rafsaf