azure-cli
azure-cli copied to clipboard
[Packaging] Optimize Dockerfile build
This PR makes the build Docker artifact smaller by including multi-stage build in Dockerfile, and trim ~250MB from the exported image.
This checklist is used to make sure that common guidelines for a pull request are followed.
-
[x] The PR title and description has followed the guideline in Submitting Pull Requests.
-
[x] I adhere to the Command Guidelines.
-
[x] I adhere to the Error Handling Guidelines.
Thank you for your contribution hholst80! We will review the pull request and get back to you soon.
@microsoft-github-policy-service agree
Packaging
2 weeks of no activity. I am pretty sure I did not break anything with my changes. (Check the Format of Pull Request Title and Content) should have been fixed 2 weeks ago.
Including the __pycache__
cleanup, the build image is down to ~704MB (uncompressed) on my local build. From 1.3GB that is worth while. I think the same cleanup (remove __pycache__
) should be included in the rpm and deb build as well. Either ship the .py or just the .pyc but not both?
the latest image has been reduced to 1GB
mcr.microsoft.com/azure-cli latest cb37d43fe470 5 weeks ago 1GB
But still not enough.
@hholst80
can you confirm with latest version 2.51.0, if your PR still can reduce extra 200MB?
======
some discussions in https://github.com/Azure/azure-cli/issues/19591
and add my comments and discoveries.
https://github.com/Azure/azure-cli/issues/19591#issuecomment-1405972500
The reason why it is so big, because, azure_cli
keeps all version of codes for each component.
For example, for az network
, it keeps 33 copes when you install azure_cli in Docker.
We need know how to install with latest version only.
$ pwd
/app/.venv/lib/python3.11/site-packages/azure/mgmt/network
$ ls -l
...
9708 v2018_11_01
9964 v2018_12_01
10180 v2019_02_01
10768 v2019_04_01
11280 v2019_06_01
11640 v2019_07_01
11860 v2019_08_01
11944 v2019_09_01
12172 v2019_11_01
12344 v2019_12_01
12736 v2020_03_01
12924 v2020_04_01
13592 v2020_05_01
14180 v2020_06_01
14420 v2020_07_01
14596 v2020_08_01
14700 v2020_11_01
14880 v2021_02_01
16984 v2022_01_01
$ ls -ld v*|wc -l
33
In general, this tool is installed about 33 times, that's why it is so big.
The reason why it is so big, because,
azure_cli
keeps all version of codes for each component.For example, for
az network
, it keeps 33 copes when you install azure_cli in Docker.
We previously trimmed Python SDKs for Windows and Linux packages: https://github.com/Azure/azure-cli/pull/23946. We will apply the same logic to docker images.
The reason why it is so big, because,
azure_cli
keeps all version of codes for each component. For example, foraz network
, it keeps 33 copes when you install azure_cli in Docker.We previously trimmed Python SDKs for Windows and Linux packages: #23946. We will apply the same logic to docker images.
Thanks for the update. Please share the ticket or issue for docker image so I can follow up (subsribe it)
We're down to 485 MB now (compressed). If we keep this PR open long enough, we will eventually get it below 100 MB. ;-)
FYI, keeping all API versions costs 115 MB (again, compressed)
If anyone can give a green light to remove the "useful tools" included (jq, curl, and jp) we can get the image size down to 438 MB. Actually, if we are very aggressive, removing git, perl, and such too, below 400 MB is possible. This will break workflows (I was myself using curl inside the container), and I am not sure that the break is worth the small extra saving in size. It is however odd that the container exposed this tooling to being with because now there are users using it. Including myself...
If this flies with the CI/CD gods, the size is almost acceptable:
🔄AzureCLI-FullTest
️✔️acr
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️acs
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️advisor
️✔️latest
️✔️3.11
️✔️3.9
️✔️ams
️✔️latest
️✔️3.11
️✔️3.9
️✔️apim
️✔️latest
️✔️3.11
️✔️3.9
️✔️appconfig
️✔️latest
️✔️3.11
️✔️3.9
️✔️appservice
️✔️latest
️✔️3.11
️✔️3.9
️✔️aro
️✔️latest
️✔️3.11
️✔️3.9
️✔️backup
️✔️latest
️✔️3.11
️✔️3.9
️✔️batch
️✔️latest
️✔️3.11
️✔️3.9
️✔️batchai
️✔️latest
️✔️3.11
️✔️3.9
️✔️billing
️✔️latest
️✔️3.11
️✔️3.9
️✔️botservice
️✔️latest
️✔️3.11
️✔️3.9
️✔️cdn
️✔️latest
️✔️3.11
️✔️3.9
️✔️cloud
️✔️latest
️✔️3.11
️✔️3.9
️✔️cognitiveservices
️✔️latest
️✔️3.11
️✔️3.9
️✔️config
️✔️latest
️✔️3.11
️✔️3.9
️✔️configure
️✔️latest
️✔️3.11
️✔️3.9
️✔️consumption
️✔️latest
️✔️3.11
️✔️3.9
️✔️container
️✔️latest
️✔️3.11
️✔️3.9
🔄containerapp
🔄latest
🔄3.11
️✔️3.9
️✔️core
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️cosmosdb
️✔️latest
️✔️3.11
️✔️3.9
️✔️databoxedge
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️dla
️✔️latest
️✔️3.11
️✔️3.9
️✔️dls
️✔️latest
️✔️3.11
️✔️3.9
️✔️dms
️✔️latest
️✔️3.11
️✔️3.9
️✔️eventgrid
️✔️latest
️✔️3.11
️✔️3.9
️✔️eventhubs
️✔️latest
️✔️3.11
️✔️3.9
️✔️feedback
️✔️latest
️✔️3.11
️✔️3.9
️✔️find
️✔️latest
️✔️3.11
️✔️3.9
️✔️hdinsight
️✔️latest
️✔️3.11
️✔️3.9
️✔️identity
️✔️latest
️✔️3.11
️✔️3.9
️✔️iot
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️keyvault
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️kusto
️✔️latest
️✔️3.11
️✔️3.9
️✔️lab
️✔️latest
️✔️3.11
️✔️3.9
️✔️managedservices
️✔️latest
️✔️3.11
️✔️3.9
️✔️maps
️✔️latest
️✔️3.11
️✔️3.9
️✔️marketplaceordering
️✔️latest
️✔️3.11
️✔️3.9
️✔️monitor
️✔️latest
️✔️3.11
️✔️3.9
️✔️mysql
️✔️latest
️✔️3.11
️✔️3.9
️✔️netappfiles
️✔️latest
️✔️3.11
️✔️3.9
️✔️network
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️policyinsights
️✔️latest
️✔️3.11
️✔️3.9
️✔️privatedns
️✔️latest
️✔️3.11
️✔️3.9
️✔️profile
️✔️latest
️✔️3.11
️✔️3.9
️✔️rdbms
️✔️latest
️✔️3.11
️✔️3.9
️✔️redis
️✔️latest
️✔️3.11
️✔️3.9
️✔️relay
️✔️latest
️✔️3.11
️✔️3.9
️✔️resource
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️role
️✔️latest
️✔️3.11
️✔️3.9
️✔️search
️✔️latest
️✔️3.11
️✔️3.9
️✔️security
️✔️latest
️✔️3.11
️✔️3.9
️✔️servicebus
️✔️latest
️✔️3.11
️✔️3.9
️✔️serviceconnector
️✔️latest
️✔️3.11
️✔️3.9
️✔️servicefabric
️✔️latest
️✔️3.11
️✔️3.9
️✔️signalr
️✔️latest
️✔️3.11
️✔️3.9
️✔️sql
️✔️latest
️✔️3.11
️✔️3.9
️✔️sqlvm
️✔️latest
️✔️3.11
️✔️3.9
️✔️storage
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️synapse
️✔️latest
️✔️3.11
️✔️3.9
️✔️telemetry
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️util
️✔️latest
️✔️3.11
️✔️3.9
️✔️vm
️✔️2018-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2019-03-01-hybrid
️✔️3.11
️✔️3.9
️✔️2020-09-01-hybrid
️✔️3.11
️✔️3.9
️✔️latest
️✔️3.11
️✔️3.9
️✔️AzureCLI-BreakingChangeTest
️✔️Non Breaking Changes
If this flies with the CI/CD gods, the size is almost acceptable:
yes, you are right. i use it in CICD, thanks for the efforts
@hholst80
Could you show me how to build the image locally?
Should it be fine that I run docker build -t azure_cli .
under the root of repo ?
Updates
I successfully built it locally and confirm it is less than 500M, I can use it in my pipeline directly. Thanks,
I just ran 'dive' on this PR and it looks like there's a potential for another 111MB of savings...
@hholst80 - Could you also run dive and see what it tells you?
https://github.com/wagoodman/dive
I just ran 'dive' on this PR and it looks like there's a potential for another 111MB of savings...
@hholst80 - Could you also run dive and see what it tells you?
https://github.com/wagoodman/dive
Great catch there. Thank you @jongio. The merge COPY was indeed a bit sloppy. We're up to 99% efficiency as per dive analysis. I am using buildx extensions here, but that should be no concern I hope. We'll see if the Build CI/CD throws an error or not.
EDIT: seems the build passed just fine, so buildx is enabled and functioning just as expected with dockerfile:1.2 syntax.
Do we need all of these installed? Are these just needed for building or do we shell out to them?
# We don't use openssl (3.0) for now. We only install it so that users can use it.
# libintl and icu-libs - required by azure devops artifact (az extension add --name azure-devops)
RUN apk add --no-cache \
bash bash-completion openssh-keygen ca-certificates openssl \
curl jq git perl zip \
libintl icu-libs libc6-compat \
&& update-ca-certificates
The jp
installation is also hard-coded to amd64. Do we want support for native arm64 such as apple silicon or surface users?
@hholst80 - Can you look into using a Chiselled image?
https://ubuntu.com/blog/benefits-chiselled-ubuntu-images-aspnet-eshop-demo https://github.com/canonical/chisel
Do we need all of these installed? Are these just needed for building or do we shell out to them?
# We don't use openssl (3.0) for now. We only install it so that users can use it. # libintl and icu-libs - required by azure devops artifact (az extension add --name azure-devops) RUN apk add --no-cache \ bash bash-completion openssh-keygen ca-certificates openssl \ curl jq git perl zip \ libintl icu-libs libc6-compat \ && update-ca-certificates
The
jp
installation is also hard-coded to amd64. Do we want support for native arm64 such as apple silicon or surface users?
I think it is out of scope for this PR to remove or add to existing tooling. Doing so will break between zero, 1, and a million users workflows so it requires some sensitivity and cost benefit analysis etc.
Removing curl broke my own workflows so it is a real problem not just theoretical issue.
This PR takes forever to get across and it's getting better and better. But it will not provide much value until it is merged.
@jiasli - I'm wondering if we have any CLI perf tests to see if removing the pycache impacts runtime perf.
@jiasli - I'm wondering if we have any CLI perf tests to see if removing the
__pycache__
impacts runtime perf.
Compare it to the overhead of starting the container, I'd assume it's negligible.
Actually, the total time could be larger, since the image layers need to be unpacked and copied on storage. It could be cheaper to just build the byte code on the fly.
Edit: validation of assumptions
Benchmark with and without __pycache__
overhead of starting the docker container itself
I] root@fedora ~/azure-cli (dev)
# time docker run -it --rm azure-cli echo foo
Executed in 246.94 millis fish external
usr time 9.20 millis 257.00 micros 8.94 millis
sys time 6.77 millis 74.00 micros 6.70 millis
Executed in 274.07 millis fish external
usr time 8.17 millis 304.00 micros 7.86 millis
sys time 8.95 millis 107.00 micros 8.84 millis
Executed in 266.15 millis fish external
usr time 9.38 millis 245.00 micros 9.13 millis
sys time 7.52 millis 86.00 micros 7.44 millis
Running az --version
with __pycache__
(692MB image)
Executed in 1.98 secs fish external
usr time 8.65 millis 386.00 micros 8.26 millis
sys time 7.43 millis 0.00 micros 7.43 millis
Executed in 1.82 secs fish external
usr time 10.15 millis 0.00 micros 10.15 millis
sys time 6.44 millis 332.00 micros 6.10 millis
Executed in 1.78 secs fish external
usr time 9.04 millis 266.00 micros 8.77 millis
sys time 7.87 millis 84.00 micros 7.79 millis
Running az --version
without __pycache__
(411MB)
Executed in 2.72 secs fish external
usr time 10.46 millis 254.00 micros 10.20 millis
sys time 5.18 millis 77.00 micros 5.11 millis
Executed in 2.58 secs fish external
usr time 9.28 millis 258.00 micros 9.03 millis
sys time 7.33 millis 78.00 micros 7.25 millis
Executed in 2.56 secs fish external
usr time 8.35 millis 257.00 micros 8.09 millis
sys time 8.17 millis 79.00 micros 8.09 millis
Runnig something more heavy, like az --help
Running az --help
with __pycache__
(692MB)
Executed in 8.37 secs fish external
usr time 9.06 millis 289.00 micros 8.78 millis
sys time 8.90 millis 95.00 micros 8.80 millis
Executed in 8.17 secs fish external
usr time 10.80 millis 246.00 micros 10.55 millis
sys time 7.78 millis 82.00 micros 7.70 millis
Executed in 8.27 secs fish external
usr time 10.30 millis 303.00 micros 9.99 millis
sys time 8.12 millis 101.00 micros 8.02 millis
Running az --help
without __pycache__
(411MB)
Executed in 9.59 secs fish external
usr time 10.87 millis 0.00 micros 10.87 millis
sys time 6.88 millis 353.00 micros 6.52 millis
Executed in 8.91 secs fish external
usr time 11.70 millis 279.00 micros 11.42 millis
sys time 8.00 millis 94.00 micros 7.91 millis
Executed in 8.88 secs fish external
usr time 11.72 millis 0.00 micros 11.72 millis
sys time 7.84 millis 336.00 micros 7.50 millis
Here I realized that the __pycache__
eliminiation was too aggressive. I removed things not only from azure package. I fixed that in the latest commit.
diff --git a/Dockerfile b/Dockerfile
index 794af5e3b..1c85845cf 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -33,7 +33,7 @@ COPY . /azure-cli
RUN ./scripts/install_full.sh
RUN python3 ./scripts/trim_sdk.py
RUN /usr/local/bin/az && bash -c "source /usr/local/bin/az.completion.sh"
-RUN find /usr/local -name __pycache__ | xargs rm -rf
+RUN find /usr/local/lib/python*/site-packages/azure -name __pycache__ | xargs rm -rf
# Install jp tool
RUN <<HERE
Now the timings are much more comparable, but the size increased just slightly. The delta is likely to be due to dependencies on azure package outside its namespace.
[I] root@fedora ~/azure-cli (dev)
# time docker run -it --rm azure-cli az --help > /dev/null
________________________________________________________
Executed in 8.11 secs fish external
usr time 15.45 millis 5.24 millis 10.21 millis
sys time 9.74 millis 2.03 millis 7.71 millis
[I] root@fedora ~/azure-cli (dev)
# time docker run -it --rm azure-cli az --help > /dev/null
________________________________________________________
Executed in 8.07 secs fish external
usr time 11.09 millis 332.00 micros 10.76 millis
sys time 6.93 millis 89.00 micros 6.84 millis
[I] root@fedora ~/azure-cli (dev)
# time docker run -it --rm azure-cli az --version > /dev/null
________________________________________________________
Executed in 1.70 secs fish external
usr time 8.79 millis 0.00 micros 8.79 millis
sys time 7.48 millis 445.00 micros 7.04 millis
[I] root@fedora ~/azure-cli (dev)
# time docker run -it --rm azure-cli az --version > /dev/null
________________________________________________________
Executed in 1.72 secs fish external
usr time 10.45 millis 330.00 micros 10.12 millis
sys time 7.92 millis 88.00 micros 7.84 millis
[I] root@fedora ~/azure-cli (dev)
#
The
jp
installation is also hard-coded to amd64. Do we want support for native arm64 such as apple silicon or surface users?
This was an oversight in a merge from my side. It is included now, with support for the same architecture as in dev batch.
@bebound what is your final thoughts on the multi-stage build should we keep it as-is (I reduced it to a single build stage and merged the 'tooling' stage with that). We can reduce it down to a single build stage and remove the .build-deps but I think the result is more ugly. Also for some reason this build is 1MB larger than the multi-stage. Gods know why. I am OK to drop the multi-stage if you think this PR is better without it.
# syntax=docker/dockerfile:1.4
#---------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
#---------------------------------------------------------------------------------------------
ARG PYTHON_VERSION="3.11"
FROM python:${PYTHON_VERSION}-alpine
ARG JP_VERSION="0.2.1"
ARG CLI_VERSION
# Metadata as defined at http://label-schema.org
ARG BUILD_DATE
LABEL maintainer="Microsoft" \
org.label-schema.schema-version="1.0" \
org.label-schema.vendor="Microsoft" \
org.label-schema.name="Azure CLI" \
org.label-schema.version=$CLI_VERSION \
org.label-schema.license="MIT" \
org.label-schema.description="The Azure CLI is used for all Resource Manager deployments in Azure." \
org.label-schema.url="https://docs.microsoft.com/cli/azure/overview" \
org.label-schema.usage="https://docs.microsoft.com/cli/azure/install-az-cli2#docker" \
org.label-schema.build-date=$BUILD_DATE \
org.label-schema.vcs-url="https://github.com/Azure/azure-cli.git" \
org.label-schema.docker.cmd="docker run -v \${HOME}/.azure:/root/.azure -it mcr.microsoft.com/azure-cli:$CLI_VERSION"
# We don't use openssl (3.0) for now. We only install it so that users can use it.
# libintl and icu-libs - required by azure devops artifact (az extension add --name azure-devops)
RUN apk add --no-cache \
bash bash-completion openssh-keygen ca-certificates openssl \
curl jq git perl zip \
libintl icu-libs libc6-compat \
&& update-ca-certificates
RUN --mount=target=/mnt --mount=type=cache,target=/root/.cache --mount=type=tmpfs,target=/root/.azure <<HERE
apk add --no-cache --virtual .build-deps gcc make openssl-dev libffi-dev musl-dev linux-headers
mkdir /azure-cli
cd /mnt
cp -ar . /azure-cli
cd /azure-cli
./scripts/install_full.sh
python3 ./scripts/trim_sdk.py
cd /
rm -rf /azure-cli
/usr/local/bin/az && bash -c "source /usr/local/bin/az.completion.sh"
ln -s /usr/local/bin/az.completion.sh /etc/profile.d/
find /usr/local/lib/python*/site-packages/azure -name __pycache__ | xargs rm -rf
arch=$(arch | sed s/aarch64/arm64/ | sed s/x86_64/amd64/)
curl -L https://github.com/jmespath/jp/releases/download/${JP_VERSION}/jp-linux-${arch} -o /usr/local/bin/jp
chmod +x /usr/local/bin/jp
apk del --no-cache .build-deps
HERE
ENV AZ_INSTALLER=DOCKER
CMD bash
I've created a draft pr to remove the unnecessary packages: https://github.com/Azure/azure-cli/pull/27567
Removing COPY . /azure-cli
is in #27683
Removing __pycache__
saves the images size, but it increases the container size.
We created a PR which only keeps pyc file: https://github.com/Azure/azure-cli/pull/25801, but we did not merge it.
not sure if this change has been implemented, the image is size reduced in latest versions already, the images built after 14th Nov.
mcr.microsoft.com/azure-cli latest 9f400a9915c1 2 weeks ago 717MB
azure_cli latest 791d5491db53 2 months ago 479MB
But still 200MB+ more than the image I build with this PR.
The two enhancements metioned in previous comment were merged in 2.54 and images size is dropped to 700MB.
There is no plan to keep only py file or pyc file for now. Removing pyc file impacts the execution time and increases container size, which makes no sense. Keeping pyc only is applied in MSI. I'm not sure if we need to do the same on Linux.
- It removes the source code from error stack
- This technology is not adopted among other projects.
Removing pyc file impacts the execution time and increases container size, which makes no sense.
That was not what my benchmark tests showed though. There was no impact on az --version
nor az --help
. The container size can be fixed by adding PYTHONDONTWRITEBYTECODE=1
to the ENV section. It might make execution faster so it should be added as there is no downside in this PR to add it. I do not have time to keep my PR mergable, just pull out the Dockerfile
(w/ or w/o the multi-stage) and copy it in if anyone has interest in it.
@hholst80 I highly doubt your test result that compiling bytecode each time does not slow down the execution.
According to my tests, the execution time for az
increased from 2s to 7s after the removal of the .pyc files.
Let me clarify my point.
- Removing .pyc files from the image doesn't save disk space because when a user runs the image, the compiled .pyc files are generated and added to the container. This approach actually uses more disk space if there are multiple containers.
- If we remove pyc and add
PYTHONDONTWRITEBYTECODE=1
in image env, this slows down the execution.
In conclusion, I prefer keeping pyc.
I would too if I saw that kind result.
-
What py cache files did you remove. Removing the standard lib pycache offers no benefit as they are anyway baked in the base.
-
How and what sub command did you benchmark, and arguments
Here is the code:
run --rm -it mcr.microsoft.com/azure-cli:2.55.0 bash
find /usr/local/lib/python3.11/site-packages/azure -name '*.pyc' -exec rm -f {} +
PYTHONDONTWRITEBYTECODE=1 time az