botocore
botocore copied to clipboard
Support SDK modularization on a service basis
A common use case may be to use some boto functions in a client app frozen with pyinstaller, in this case botocore will add 30MB to the install size.
Is there any plan to document how to trim the install size? For example:
- Specify which services you are going to be using with setuptools
install_extras
- Archive old service definitions outside pypi package
As of right now we don't have any official guidance on slimming down the package size to remove unused services.
We wouldn't be able to do the second one as it's possible to specify that you want to use older API versions.
This is effectively a feature request to be able to modularize the the Python SDK in a similar fashion as some of the other SDKs (Java, ruby, etc). Marking as a feature request.
Any updates? With aws lambda unpacked limit being 250 mb, the 42mb of botocore are really a lot
AWS JavaScript SDK V3 had pulled this off. It would be great to have the same for Boto3 which soon will be unusable for AWS Lambdas.
For those looking for a short-term solution.
A customer wrote an article demonstrating how to selectively discard services you are sure you don't use
https://blog.cubieserver.de/2020/building-a-minimal-boto3-lambda-layer/
Any kind of hint or indication that when this would be worked on?
1.3M boto3
55M botocore
696K dateutil
148K jmespath
552K s3transfer
36K six.py
824K urllib3
Following up as we're now at 64M - I understand this would be a huge undertaking considering how these are created today, so I'm primarily interested in hearing whether the team is considering a modularization in the future.
+1
Copying my comment from https://github.com/boto/botocore/issues/2842#issuecomment-1367484168
On further inspection it looks like about
70MB
out of that72.5MB
is just thedata/
directory. :exploding_head:I'm sure there's options here. One could be to split each service into an individual package (e.g.
botocore-a-la-carte-s3
, etc...). >botocore-a-la-carte
would contain core code with an extra per service (e.g.botocore-a-la-carte[s3, cloudfront]
) and to > maintain backwards-compatibilitybotocore
would simply bebotocore-a-la-carte[all]
.
From https://github.com/thejcannon/botocore-a-la-carte I've started publishing botocore-a-la-carte
with an additional package per service provided as an extra on the main package.
E.g. botocore-a-la-carte
just has the Python code and core resources. botocore-a-la-carte[s3]
also install the S3 data, etc...
What about different versions of API, are the old versions are still needed?
$ du -hs *
1,2M boto3
48K boto3-1.28.44.dist-info
83M botocore
220K botocore-1.31.44.dist-info
83m! where in (not so unusual) venv all ~30 libs takes 133M (so 63% is botocore) with only cryptography 41.0.3 even close to botocore at 14M
.
This is so so so much, alpine linux is whole system with required disk size of 130M.
And looking how the size is evolving we will see 100M in 2024 probably :disappointed: :cry:
I used this myself:
function join_elements {
local prefix="-path */" separator=" -o "
local prefixed=("${@/#/$prefix}")
local rest=("${prefixed[@]:1}")
local separated=$(printf "%s" "${prefixed[0]}${rest[@]/#/$separator}")
echo "${separated[@]}"
}
function remove_but_latest {
local latest=$(ls -1 "botocore/data/$1" | sort -r | head -1)
find "botocore/data/$1" -mindepth 1 -maxdepth 1 -type d -not -path "*/${latest}" -exec rm -vr '{}' +
}
function keep_components {
local component
find_params=$(join_elements "$@")
set -o noglob
find botocore/data -mindepth 1 -maxdepth 1 -type d -not \( $find_params \) -prune -exec rm -vr '{}' +
set +o noglob
for component in "$@"; do remove_but_latest "$component"; done
}
keep_components cloudformation dynamodb ec2 elbv2 ssm sso sts
It seems to work, and helped me reduce docker image of my application to 83MB, not sure if it can cause issues.
Thanks! that's useful overview.
I ended up with dead simple cp
+ rm
only solution. I'm only using s3
and it seems that everything (or most of things) which ends with *json
is also used by the common code.
RUN mkdir /tmp/data \
&& cp -r /usr/local/lib/python3.11/site-packages/botocore/data/s3 /tmp/data/ \
&& cp -f /usr/local/lib/python3.11/site-packages/botocore/data/*.json /tmp/data \
&& rm -rf /usr/local/lib/python3.11/site-packages/botocore/data \
&& cp -r /tmp/data /usr/local/lib/python3.11/site-packages/botocore/ \
&& rm -rf /tmp/data
Seems to work just fine, probably until it doesn't. So having official support would be a nice thing.
PS. above code 83M -> 5M drop.