readthedocs-docker-images
readthedocs-docker-images copied to clipboard
Create a `nopdf` version for day to day use
Most of the image size and build time comes from the LaTeX tooling required to build PDF's. We should create a nopdf version of the images that exclude this. We should perhaps also have a no-libraries version that simply has the basic python tooling. We could use this for building sphinx docs and conda, without needing the large size.
We ran into issues splitting things up, so this is on hold. I still think its a good idea, perhaps someone more familiar with Dockerfiles can offer some guidance on how to modularize the containers.
Hi @agjohnson!
On the one hand, the base is huge, too huge indeed, astonishing huge, so huge that play-with-docker breaks when trying to pull it, because it reaches the disk quota (4GB). I know you have very heavy dependencies, which is hard to manage, but I think you are pushing docker images to the limit of the concept behind containers, which is modularization. Indeed, I think that the default limit for images is 10GB and this requires 8.9 GB. Luckily, there are two approaches you can take to somehow alleviate this:
The first one is straighforward: use docker multi-stage builds to get the nopdf or other versions at no cost. Note that the main feature of multi-stage builds is that intermediate images are cached, so the total build time to generate multiple staged images is the same you would need to build a single one. I cloned this repo and made minimum modifications to show the concept: https://github.com/1138-4EB/readthedocs-docker-images/blob/multi-stage/Dockerfile
Now, to have both images built:
docker build -t readthedocs/build:nolatex --target rtd-base .
docker build -t readthedocs/build --target rtd-latex .
You can see the output in this travis build: https://travis-ci.org/1138-4EB/readthedocs-docker-images/jobs/321011442
The second one might be a little trickier: make a lightweight base image that acts as an orchestrator, and make it execute every task in it's own container (one for LaTeX, a different one for python, another one for js...). This can be done mounting the docker socket in the orchestrator and using named volumes. It is difficult to make a specific enhancement proposal here, because I don't really know when or how you use the tools installed in the base image.
I opened a design document that talks a little about this at https://github.com/readthedocs/readthedocs.org/pull/7566
@humitos, you might find buildthedocs/docker and/or buildthedocs/btd inspiring.
Our new Docker image is about ~5Gb (#166) which is still big. I did a small test by building a ubuntu20-nopdf version of that PR and it ended up being ~1.5Gb 😮 and it built in 1m30s 😮
So, considering that we will have only 1 image per OS version supported and it shouldn't be rebuilt too often, we can definitely expose a -nopdf image to developers using the Local Environment.
@humitos, nice work! Keep up!
Note that the trick we use in buildthedocs it's not the -nopdf only, but a complementary latex image (https://hub.docker.com/r/btdi/latex/tags?page=1&ordering=last_updated). The point is that we use one container for running Sphinx (which needs to contain sphinx and the user's extensions/dependencies only), and then we run a different container for building the PDF (which does not need sphinx or python, but just LaTeX packages). Overall, we are decoupling source generation and document compilation.
Since you seem to be reevaluating your current stack, I hope that might be inspiring for you.