ribbit-network-frog-hardware
ribbit-network-frog-hardware copied to clipboard
Investigate reducing co2 and gpsd image sizes
See if we can reduce the sizes of the images to speed up updates, which allows a new device to come online faster and makes developing for our devices less of a waiting game! Reduces the waste of bandwidth as well ;)
An easy way that I've found to reduce image sizes is to use the multi-stage builds feature in Docker. We should be able to add build step which starts a fresh container, then copy all of the compiled/downloaded modules from the previous step, taking just what we need to run.
The simplest image that we could create would just contain the code and it's dependencies, but it may be worth keeping some tools around that are helpful debugging (e.g. i2cdetect
).
I did an experiment to see how much we can save.
- "phase 1": https://github.com/abesto/ribbit-network-frog-sensor/commit/01ffab3a7d945f9f1cb37523d8e41511ec0503b9 - multi-stage build using a virtualenv
- "phase 2": https://github.com/abesto/ribbit-network-frog-sensor/commit/e456650f5fdb9ac3ec98a4276b9f303c34541424 - pay ridiculous amounts of build time for extra space savings
And the numbers:
- Base image: 243MB
- Current
co2
image size: 540MB - phase 1: 438MB
- phase 2: 423MB
Phase 2 is (at least in my buildx
x-plat build environment) ridiculously slow, so definitely not worth it for the extra 10MB saving. Hooowever, phase 1 seems like it's worth doing? It's a 12% size decrease. Not as dramatic as you'd see with a static binary running on Alpine or whatever, but we are talking about Python. To get more, we'd probably need to start trimming the actual dependencies.
(Note, we should probably have done PYTHONDONTWRITEBYTECODE=1
on phase 1, should be zero overhead and save some space on .pyc
files)
... A quick look through the output of the install_packages
we start with shows stuff like fonts and libgtk2
being installed, so I'm guessing there's some further space to be saved by tracking down packages that think we need a desktop environment, and convincing them of the error of their ways.
Instead of copying all the content you could copy in just the file needed for the installs (pyproject.toml and the lock file?). That way when you make changes to co2.py
it won't bust your cache, it will only rebuild when it needs new packages in the build step:
# This will copy all files in our root to the working directory in the container
# It's almost guaranteed to bust the Docker build cache, so do it as late as possible
COPY . ./
https://github.com/abesto/ribbit-network-frog-sensor/blob/e456650f5fdb9ac3ec98a4276b9f303c34541424/software/co2/Dockerfile.template#L48-L50
This step isn't really going to give any advantage, the build steps are disposable packages that never makes it to the devices. Due to the layering in Docker and build caching it also won't really save space on your local machine. At the moment it will just slow the build down ever so slightly while you wait for the uninstall process to complete.
# Save a tiny amount of space: we don't need these at runtime
RUN pip uninstall -y wheel pip
https://github.com/abesto/ribbit-network-frog-sensor/blob/e456650f5fdb9ac3ec98a4276b9f303c34541424/software/co2/Dockerfile.template#L55-L56
Using the balena :buster-build
images could also help build time as less installs will be required in the build step. It will make the build step image bigger, but as that never lands on a device there is little reason to worry.
Would also consider locking the Python version. I think it is something like balenalib/%%BALENA_MACHINE_NAME%%-python:3.12-buster-build
. Otherwise a new Python update from 3.10 to 3.11 for example will be pushed out to the devices when using just python:buster-build
. There is usually little need to be on the edge like that, can be done intermittently as and when needed. A Python update inside the container will require a lot of bandwidth to do across devices. Not to mention its helpful for ensuring breaking changes don't slip in.
Come to think of it, would say the same for the pip --upgrade
in there too. It is updating without version control. Better to bump the image version with pip inside than always running on the latest pip, just to create a predictable environment.
Instead of copying all the content you could copy in just the file needed for the installs
Accurate; check out https://github.com/Ribbit-Network/ribbit-network-frog-sensor/blob/main/software/co2/Dockerfile.template, we're actually doing this!
This step isn't really going to give any advantage, the build steps are disposable packages that never makes it to the devices.
We install wheel
and pip
after setting up the venv, and we ship the venv to the runtime image, so surely removing them saves the space that pip
and wheel
take up?
Using the balena :buster-build images could also help build time
Yep, and we're also doing that! :D
Would also consider locking the Python version. Come to think of it, would say the same for the pip --upgrade in there too.
pip
I'm conflicted about, but the Python version, for sure.
Whoops, I was looking at the experiment repos you linked, looks like there is another one merged with a lot of the things already in it.
We install wheel and pip after setting up the venv, and we ship the venv to the runtime image, so surely removing them saves the space that pip and wheel take up?
Ah I see. Nice catch.
I tend to install with —user instead of a virtual env which would use the global wheel package. But they are all much of a muchness. https://pythonspeed.com/articles/multi-stage-docker-python/
We're moving to an esp32-based frog for the foreseeable future, so closing this won't fix fow now. Perhaps once the global supply chain clears up a bit, we will revisit this.
New software repo below:
https://github.com/Ribbit-Network/ribbit-network-frog-software