azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

Azure SDK is over 500MB and growing on each release.

Open sodul opened this issue 3 years ago • 43 comments

The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.

I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.

root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n 
1.2M	aio
2.4M	v2015_06_15
3.3M	v2016_09_01
3.5M	v2016_12_01
3.7M	v2017_03_01
4.4M	v2017_06_01
4.4M	v2017_08_01
4.9M	v2017_09_01
5.1M	v2017_10_01
5.1M	v2017_11_01
5.1M	v2018_01_01
5.7M	v2018_02_01
6.5M	v2018_04_01
6.6M	v2018_06_01
6.9M	v2018_07_01
8.3M	v2018_08_01
8.4M	v2018_10_01
8.6M	v2018_11_01
8.8M	v2018_12_01
9.0M	v2019_02_01
9.5M	v2019_04_01
10M	v2019_06_01
11M	v2019_07_01
11M	v2019_08_01
11M	v2019_09_01
11M	v2019_11_01
11M	v2019_12_01
12M	v2020_03_01
12M	v2020_04_01
13M	v2020_05_01
13M	v2020_06_01
13M	v2020_07_01
13M	v2020_08_01
259M	total

Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.

sodul avatar Apr 05 '21 19:04 sodul

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @aznetsuppgithub.

Issue Details

The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.

I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.

root@1bba10bd1500:~/.pyenv/versions/3.9.2/lib/python3.9/site-packages/azure/mgmt/network# du -shc * | grep M | sort -n 
1.2M	aio
2.4M	v2015_06_15
3.3M	v2016_09_01
3.5M	v2016_12_01
3.7M	v2017_03_01
4.4M	v2017_06_01
4.4M	v2017_08_01
4.9M	v2017_09_01
5.1M	v2017_10_01
5.1M	v2017_11_01
5.1M	v2018_01_01
5.7M	v2018_02_01
6.5M	v2018_04_01
6.6M	v2018_06_01
6.9M	v2018_07_01
8.3M	v2018_08_01
8.4M	v2018_10_01
8.6M	v2018_11_01
8.8M	v2018_12_01
9.0M	v2019_02_01
9.5M	v2019_04_01
10M	v2019_06_01
11M	v2019_07_01
11M	v2019_08_01
11M	v2019_09_01
11M	v2019_11_01
11M	v2019_12_01
12M	v2020_03_01
12M	v2020_04_01
13M	v2020_05_01
13M	v2020_06_01
13M	v2020_07_01
13M	v2020_08_01
259M	total

Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.

Author: sodul
Assignees: -
Labels:

Mgmt, Network, Service Attention, customer-reported, question

Milestone: -

ghost avatar Apr 05 '21 19:04 ghost

Hi @sodul, thanks for the feedback, we'll investigate asap.

kristapratico avatar Apr 05 '21 19:04 kristapratico

Previously reported in #11149.

jiasli avatar Jun 22 '21 03:06 jiasli

To clarify #11149 is only about azure-mgmt-network which is the largest directory but the problem is present across the entire Azure SDK.

I understand the reasoning for the approach to keep everything for backward compatibility but if you do have customers that point to the old versions then they should pin their requirement versions to the old pypi.org releases of the Azure SDK, not force everyone to keep a copy of everything around. How about providing two versions of the SDKs: one large with everything, one small with just the latest version.

sodul avatar Jun 23 '21 00:06 sodul

Hey, is there any update?

nolaexe avatar Aug 05 '21 06:08 nolaexe

I wrote a script that we run after pip install. It detects the unused versions and this got us an azure folder shrink from ~ 680MB to ~ 280MB. It cannot go any lower because for some reason some of the objects model definitions from multiple versions are merged together to make the final list that is then used. The script detects the versions that are used internally by the SDK and preserves them, making the script very safe to use.

If there is interest I can open source the script.

sodul avatar Aug 05 '21 17:08 sodul

We have released our script on GitHub. It does delete a good chunk of the API folders but not all of it. With the script the Azure directory is now just under 300MB instead of over 700MB. It is compatible with most, but not all, third party packages, as long as they do not point to a version that is trimmed.

https://github.com/clumio-code/azure-sdk-trim

sodul avatar Aug 16 '21 20:08 sodul

@kristapratico Following up to see if there is any update on this issue? - Thank you

KranthiPakala-MSFT avatar Nov 01 '21 23:11 KranthiPakala-MSFT

@KranthiPakala-MSFT we are working on this, and there is ongoing discussion on the issue to be sure we consider all possible impact of any decisions, and nobody would be broken by it.

lmazuel avatar Nov 02 '21 00:11 lmazuel

@lmazuel I think one old proposal that won't break anything is to release separate azure-sdk-slim with only latest APIs (that are used by default) and possibly do something with comments (iirc, removing comments reduces the size by 30%)

logachev avatar Nov 09 '21 20:11 logachev

Removing non latest APIs, will remove about 60% of the disk space needed. A further design issues is that some of the API definitions import prior APIs in order to have a complete set of objects. I have no idea why these API definitions where designed this way but it is definitely not very good. I did not think of the idea of stripping comments, which means that we could probably extend azure-sdk-trim to remove comments and other useless whitespace. There is probably a tool that 'compresses' python that we could run. Of course we would not want to remove docstrings, they do help.

sodul avatar Nov 10 '21 01:11 sodul

@sodul Yeah, agreed. So far I saw only keyvault being broken by your tool (which should be fixed soon I guess https://github.com/Azure/azure-sdk-for-python/issues/21623).

I think there are actually 2 scenarios we're talking about.. Development - I agree, comments & doc strings are useful. However, building production image - docstrings are unnecessary.. The only trick there is - need to preserve number of empty lines as a replacement for a docstring comment to get same line numbers with exceptions.

logachev avatar Nov 10 '21 01:11 logachev

Hi @sodul. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost avatar Dec 15 '21 08:12 ghost

/unresolve

sodul avatar Dec 15 '21 09:12 sodul

we will go on to reduce size for azure-mgmt-network: https://github.com/Azure/azure-sdk-for-python/issues/21301

msyyc avatar Dec 31 '21 05:12 msyyc

@msyyc its not specific to azure mgmt network, its a design problem affecting every package in the sdk, network mgmt just happens to be the most egregious. unfortunately its also not clear which azure sovereign cloud maps to which version of each sdk.

kapilt avatar Jan 19 '22 16:01 kapilt

hi @kapilt, This is a multi API defined by the service team. The previous version will be retained and accumulated continuously. This also has an impact on our daily SDK release. In subsequent releases, we try to use substitution instead of accumulation.

This problem will be solved slowly in the future. Thanks!

BigCat20196 avatar Jan 20 '22 02:01 BigCat20196

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

ghost avatar Jan 31 '22 08:01 ghost

It is good to hear it is being worked on. Is there a more specific timeline that can be shared? @BigCat20196 I see you added the needs-author-feedback label, which trigger the msftbot to try to close the issue. I'm not sure what additional feedback would help move this forward.

sodul avatar Jan 31 '22 09:01 sodul

any updates, cause afaics, 18 months later, it seems like negligible progress on fixing either the continued size growth or the sdk design flaw that causes it, and that users are forced to manually trim the sdk after the fact, using third party scripts. https://github.com/clumio-code/azure-sdk-trim/

kapilt avatar Oct 11 '22 11:10 kapilt

An update on the size. With the latest SDK the size seem to still be growing. We now see close to 1GB before trimming and over 400MB after trimming:

+ /_install/azure_sdk_trim.py
/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/azure is using 987.1 MB.
/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/azure is now using 416.3 MB.

This is double the size from when this issue was opened. While there has been some acknowledgment of the issue, it does not seem to be prioritized enough.

sodul avatar Dec 16 '22 20:12 sodul

@sodul we're sorry if it felt we were not working on this, but we were actually doing it hard for quite some time now. It may seem trivial, but this is not, as this is just not acceptable to remove api versions support for customers that requires to be stuck on old API Version (there are various legit reasons why this could happen).

We have a first prototype of the Network package we were actually planning to share today: https://github.com/iscai-msft/azure-sdk-for-python/releases/download/azure-mgmt-network%40v23.0.0b1/azure_mgmt_network-23.0.0b1-py3-none-any.whl

Works with any regular pip install:

pip install https://github.com/iscai-msft/azure-sdk-for-python/releases/download/azure-mgmt-network%40v23.0.0b1/azure_mgmt_network-23.0.0b1-py3-none-any.whl

This trims Network to 25% of what it was, without removing any API versions support. Roughly it's deduplicating smartly code that didn't change between API versions using decorators.

This version only deduplicates operations for now, we're planning to work on models early 2023 to gain even more, before we consider a general release. Once it has been confirmed that this new SDK is correct, we would apply the new system to all SDK. We started by Network because it's the biggest by far.

This should not break people that never set API version, and should not break people that were using API version the way it was documented. It may break if people discovered some hidden undocumented API using Intellisense. For those, moving to the documented way to do it is the recommendation.

If some people in this thread would have the opportunity to confirm that this package is working in their workflow as expected, we would deeply appreciate the feedback. It does pass our testing on our side. This is not production ready of course, not even officially on PyPI yet.

Thanks for the patience, and sorry again for the time it took to get the first prototype.

EDIT: Release on PyPI for easier installation: https://pypi.org/project/azure-mgmt-network/23.0.0b1/

lmazuel avatar Dec 16 '22 22:12 lmazuel

Hello, full agree with @sodul , I just package Azure SDK in order to use azure ansible collection in docker image and I'm now at 1G3 ... (not only Azure but most of).

I understand that you want to provide all api version, but customer need to use package with the right api version, they can just select pip package and all will work.

At minimal, you can bring specific version / requirement with just last working api version for each Azure cloud (cloud, stack ...)

mysiki avatar Jan 06 '23 16:01 mysiki

Hello, we have a new version of Network that de-duplicates models as well, that we should be able to publish as a preview next week and brings network to 5% of its initial size (while keeping full api-version support). We plan to leave it in preview for a week or two, and switch it to stable once we did enough testing. People using the latest api-version shouldn't see any difference (or it's a bug to be clear). People using a specific api-version, may have to change their code as we simplified the code structure (for instance there is no more module named after an api-version). Breaking changes will be detailed in changelog.

lmazuel avatar Feb 17 '23 17:02 lmazuel

Hello, we have a new version of Network that de-duplicates models as well, that we should be able to publish as a preview next week and brings network to 5% of its initial size (while keeping full api-version support). We plan to leave it in preview for a week or two, and switch it to stable once we did enough testing. People using the latest api-version shouldn't see any difference (or it's a bug to be clear). People using a specific api-version, may have to change their code as we simplified the code structure (for instance there is no more module named after an api-version). Breaking changes will be detailed in changelog.

Here is the released version https://pypi.org/project/azure-mgmt-network/23.0.0b2/. Please give it a try and let us know how it goes!

If you have any issues or questions, please create an issue in this repo and tag me.

iscai-msft avatar Feb 21 '23 16:02 iscai-msft

@lmazuel @iscai-msft thank you for the effort with the Network part of the SDK but I did a clean install with the latest azure cli (we use both the CLI and the SDK in our CI/CD pipelines), azure-cli==2.46.0 and the azure folder has ballooned to 1.3GB. Even after running azure-sdk-trim the size is 560MB.

I did try to install azure-mgmt-network==23.0.0b2 which shaved 35MB, but the rest of the SDK is still massive and growing.

> du -shc *
528K	appconfiguration
3.1M	batch
177M	cli
 76K	common
1.3M	core
856K	cosmos
1.2M	data
380K	datalake
1.3M	graphrbac
1.0M	identity
7.8M	keyvault
152K	loganalytics
259M	mgmt
 35M	multiapi
 32K	profiles
392K	storage
9.6M	synapse
499M	total

Can we expect that the new release model will be applied to the rest of the SDK?

sodul avatar Mar 21 '23 05:03 sodul

@sodul yes the plan is to roll out this feature to the other sdks as well. We'll keep this issue updated because this issue isn't specific to the network sdk. Thank you again for your patience

iscai-msft avatar Mar 23 '23 19:03 iscai-msft

https://pypi.org/project/azure-mgmt-network/23.0.0/ is released and its size is only 5% of last stable version.

msyyc avatar Mar 30 '23 06:03 msyyc

@iscai-msft is this one addressed?

xiangyan99 avatar Mar 31 '23 16:03 xiangyan99

@xiangyan99 we've addressed network, but we're going to keep this issue open as we address our other large libraries, since this issue is not specific to network

iscai-msft avatar Mar 31 '23 17:03 iscai-msft