readthedocs-docker-images Create a `nopdf` version for day to day use

Most of the image size and build time comes from the LaTeX tooling required to build PDF's. We should create a nopdf version of the images that exclude this. We should perhaps also have a no-libraries version that simply has the basic python tooling. We could use this for building sphinx docs and conda, without needing the large size.

Mar 08 '17 20:03 ericholscher

We ran into issues splitting things up, so this is on hold. I still think its a good idea, perhaps someone more familiar with Dockerfiles can offer some guidance on how to modularize the containers.

Oct 19 '17 16:10 agjohnson

Hi @agjohnson!

On the one hand, the base is huge, too huge indeed, astonishing huge, so huge that play-with-docker breaks when trying to pull it, because it reaches the disk quota (4GB). I know you have very heavy dependencies, which is hard to manage, but I think you are pushing docker images to the limit of the concept behind containers, which is modularization. Indeed, I think that the default limit for images is 10GB and this requires 8.9 GB. Luckily, there are two approaches you can take to somehow alleviate this:

The first one is straighforward: use docker multi-stage builds to get the nopdf or other versions at no cost. Note that the main feature of multi-stage builds is that intermediate images are cached, so the total build time to generate multiple staged images is the same you would need to build a single one. I cloned this repo and made minimum modifications to show the concept: https://github.com/1138-4EB/readthedocs-docker-images/blob/multi-stage/Dockerfile

Now, to have both images built:

docker build -t readthedocs/build:nolatex --target rtd-base .
docker build -t readthedocs/build --target rtd-latex .

You can see the output in this travis build: https://travis-ci.org/1138-4EB/readthedocs-docker-images/jobs/321011442

The second one might be a little trickier: make a lightweight base image that acts as an orchestrator, and make it execute every task in it's own container (one for LaTeX, a different one for python, another one for js...). This can be done mounting the docker socket in the orchestrator and using named volumes. It is difficult to make a specific enhancement proposal here, because I don't really know when or how you use the tools installed in the base image.

Dec 24 '17 08:12 eine

I opened a design document that talks a little about this at https://github.com/readthedocs/readthedocs.org/pull/7566

Oct 20 '20 08:10 humitos

@humitos, you might find buildthedocs/docker and/or buildthedocs/btd inspiring.

Oct 20 '20 17:10 eine

Our new Docker image is about ~5Gb (#166) which is still big. I did a small test by building a ubuntu20-nopdf version of that PR and it ended up being ~1.5Gb 😮 and it built in 1m30s 😮

So, considering that we will have only 1 image per OS version supported and it shouldn't be rebuilt too often, we can definitely expose a -nopdf image to developers using the Local Environment.

Sep 02 '21 10:09 humitos

@humitos, nice work! Keep up!

Note that the trick we use in buildthedocs it's not the -nopdf only, but a complementary latex image (https://hub.docker.com/r/btdi/latex/tags?page=1&ordering=last_updated). The point is that we use one container for running Sphinx (which needs to contain sphinx and the user's extensions/dependencies only), and then we run a different container for building the PDF (which does not need sphinx or python, but just LaTeX packages). Overall, we are decoupling source generation and document compilation.

Since you seem to be reevaluating your current stack, I hope that might be inspiring for you.

Sep 02 '21 12:09 eine