repast4py
repast4py copied to clipboard
Very large install size in Docker
I'm building up an environmnet to use in a HPC setting. The goal is to build a docker container with several Python packages and to then convert this to a Singularity file to be used in the univesity's Slurm HPC cluster.
The one thing I've noticed, which isn't great, is that the install size of Repast4py is huge. The docker image is ~8.2 GB in size. After taking a look at the docker image layers:
> docker history 799316543ca4
IMAGE CREATED CREATED BY SIZE COMMENT
799316543ca4 9 minutes ago /bin/sh -c #(nop) ENV PYTHONPATH=/repast4py… 0B
91a65220423d 9 minutes ago /bin/sh -c env CC=mpicxx CXX=mpicxx pip inst… 8.16GB
1e7b9cd039e3 19 minutes ago /bin/sh -c pip install -r ./requirements.txt 199MB
6ccb1de7ca54 21 minutes ago /bin/sh -c #(nop) COPY file:ab16ddc3a986b259… 283B
a45824896792 25 minutes ago /bin/sh -c apt-get update && apt-get ins… 340MB
73b513f59526 2 years ago /bin/sh -c #(nop) CMD ["python3"] 0B
<missing> 2 years ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 9.51MB
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_SHA256… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_URL=ht… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_SETUPTOOLS_VER… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=21… 0B
<missing> 2 years ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
<missing> 2 years ago /bin/sh -c set -ex && savedAptMark="$(apt-… 29.5MB
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.10.0 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV GPG_KEY=A035C8C19219B… 0B
<missing> 2 years ago /bin/sh -c set -eux; apt-get update; apt-g… 3.11MB
<missing> 2 years ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
<missing> 2 years ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 2 years ago /bin/sh -c #(nop) ADD file:ece5ff85ca549f0b1… 80.4MB
And the dockerfile (similar to the file in the repast git repo):
FROM python:3.10.0-slim
RUN apt-get update && \
apt-get install -y mpich \
&& rm -rf /var/lib/apt/lists/*
# Install the python requirements
COPY ./requirements.txt ./requirements.txt
RUN pip install -r ./requirements.txt
# Install repast4py
RUN env CC=mpicxx CXX=mpicxx pip install repast4py
# Set the PYTHONPATH to include the /repast4py folder which contains the core folder
ENV PYTHONPATH=/repast4py/src
It's clear that the 8 Gig layer is coming from the RUN env CC=mpicxx CXX=mpicxx pip install repast4py command.
Using a file this large isn't impossible, but it introduces some issues with storing this in a limited free repo, building it with CI/CD, moving it to nodes in the cluster etc.
Is there a simple way to reduce the build size? I can't really believe that it's using 8 gigs of compiled C code to run repast4py.
After having a look at Issue 68 I tried to use a CPU only version of Torch. This moved the huge layer to the "pip install" layer. Which implies that Torch was being installed in the RUN env CC=mpicxx CXX=mpicxx pip install repast4py step instead of the requirements.txt step.
So it looks like the problem was Torch all along. I'm not planning on using GPU support, for now, but ~1.5 Gigs is much better :
$ docker history 9a1cb25cd860
IMAGE CREATED CREATED BY SIZE COMMENT
9a1cb25cd860 2 minutes ago /bin/sh -c #(nop) ENV PYTHONPATH=/repast4py… 0B
2e321edc7386 2 minutes ago /bin/sh -c env CC=mpicxx CXX=mpicxx pip inst… 28.4MB
64236b6f3e79 3 minutes ago /bin/sh -c pip install -r ./requirements.txt 1.42GB
00430ad1e55c 8 minutes ago /bin/sh -c #(nop) COPY file:e3a9b2a1f65788bd… 418B
a45824896792 About an hour ago /bin/sh -c apt-get update && apt-get ins… 340MB
73b513f59526 2 years ago /bin/sh -c #(nop) CMD ["python3"] 0B
<missing> 2 years ago /bin/sh -c set -ex; savedAptMark="$(apt-ma… 9.51MB
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_SHA256… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_URL=ht… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_SETUPTOOLS_VER… 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=21… 0B
<missing> 2 years ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B
<missing> 2 years ago /bin/sh -c set -ex && savedAptMark="$(apt-… 29.5MB
<missing> 2 years ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.10.0 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV GPG_KEY=A035C8C19219B… 0B
<missing> 2 years ago /bin/sh -c set -eux; apt-get update; apt-g… 3.11MB
<missing> 2 years ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 2 years ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
<missing> 2 years ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 2 years ago /bin/sh -c #(nop) ADD file:ece5ff85ca549f0b1… 80.4MB
Requirements.txt
# Using python:3.10.0-slim
# Use CPU Torch for a smaller install
--extra-index-url https://download.pytorch.org/whl/cpu
torch # https://github.com/Repast/repast4py/issues/68
mpi4py==4.0.0 #https://pypi.org/project/mpi4py/
numpy==1.26.4 #2.1.1
pandas==2.2.3
numba==0.60.0
coverage==7.6.1
networkx==3.3
pyyaml==6.0.2
Cython==3.0.11
llvmlite==0.43.0
I did a quick search in the git repo to see where Torch was being used:
- Requirements.txt
- a bunch of mentions in tests/.py and examples/.py
- a random seed generator in src/repast4py/random.py
- some mentions in the geometry.py and value_layer.py
This might be a silly question, but is Torch integral to Repast4Py?
Thanks for sharing your experience. I created #68 when encountering the same issue with creating Docker images. I wasn't able to get the docker image smaller than about what you are showing. I'll look into further ways to reduce the size of the repast4py installs based on individual use case requirements.
Thanks for looking into this! Let me know if I can help in any way.