PyBaMM Improvements to the Docker installation and usage procedures

A placeholder issue for marking a list of tasks to be done for the current Docker images established in #3223.

[x] Ensure that macOS M-series users can pull the latest PyBaMM image from DockerHub without the use of the --platform linux/x86_64 command-line flag
[x] Add relevant instructions for developers in order to contribute to PyBaMM (setting up git configuration, setting a git remote, ~~document the addition or configuration of SSH and GPG keys, and other credentials~~) (tracked by #3364)
[ ] Enable the display of matplotlib plots for examples that use QuickPlot and similar by either switching to a TKAgg backend in the code, via a matplotlibrc file, or otherwise document this change in an FAQ or Troubleshooting section
[x] Enable root permissions in the Dockerfile by setting the user ID to 1000, so that users can install apt dependencies and do other tasks inside the container (#3947)
[x] Add CI to build and push the Docker images either nightly along with nightly releases (#3251) or on commits pushed to the develop branch, such that those who pull the image from DockerHub will always have the latest changes and would not have to do git pull every time. We can refer to PyHF's workflows for this. (tracked by #3316)
~~Add a parallel job to run the entire test suite inside the Docker container in the scheduled tests workflow (or can be done alongside the previous point too in a separate workflow file. The push to DockerHub will not get triggered if any changes fail~~
[ ] Push DockerHub images on releases, starting from either 23.9 or 24.1? Note: this would be more helpful to users rather than developers (#3661)
[x] Expose ports for smooth integration of Jupyter notebooks and briefly document how to run them inside a container
[x] Expose ports for building the documentation with nox -s docs / sphinx-autobuild
[x] Install all extras dependencies by default except optional solvers, which are provided through command-line arguments, and install pandoc and graphviz (required for the rendering of Jupyter notebooks and inheritance diagrams in the built documentation). With apt, users might get an older version of pandoc so we can try getting it from conda-forge. (tracked by #3317)
[ ] Add functionality for GitHub codespaces in order to enable an online development environment and add associated documentation about their usage and billing.

Sep 05 '23 19:09 agriyakhetarpal

I tried it at my end with the current instructions and Git was setup for me automatically. It only prompted me to log in with my Github user, but I could pull and push easily (note that I have push access to the PyBaMM repo, so people who don't will have to set the remote).

Sep 06 '23 08:09 brosaplanella

I tried it on my end as well and I faced some issues too. Here are some more things that we will have to take care of:

I had to install the Dev Containers extension from Microsoft before the "Attach Visual Studio Code" dialogue option showed up in the tooltips for me.
There are a lot of ancillary yet redundant files present inside the PyBaMM/ folder, like the setup.log file and __pycache__ directories. We should use a .dockerignore file in conjunction with .gitignore such that they do not arise after the container has been built (or explore better ways to do this, this is just my initial guess). The pybamm.egg-info/ and build/ directories should be kept.
As expected, I could not create signed commits and did not have SSH or GPG configured for me automatically. However, the git config was already set (git config --list) and it asked me for my GitHub credentials (similar to Ferran's experience). We should prompt users reading the documentation to set the remote tracking to their own forks if they do not have push permissions (though if it is going to be only the maintainers who end up using it, it would be fine as-is).
For some reason I did not have nox installed when I ran conda activate pybamm. Is this intentional? Should we add [all,dev,docs] or perhaps install nox beforehand, during the build steps? I temporarily disabled the notebooks in the documentation builds to avoid errors with Pandoc not being installed, and I was able to browse on port 8000 without issues.
We could have a separate noxfile for a Docker build, because nox created its own virtual environment directory at .nox/docs/ and installed [all,docs] dependencies into it despite the fact that they were already installed, which is redundant behaviour and only slowed down the builds. This was repeated in the unit and integration environments. Otherwise, we could expose another environment variable DOCKER_INSTALL with the value 1 and read it inside noxfile.py like we are already doing for the CI. Note: nox also supports conda environments as build-backends too rather than just virtualenv, so we could point them to where conda packages are installed, since the Docker image uses conda.
I was able to run Jupyter notebooks without issues as well.
It seems like pybamm-requires was run twice when the image was built and pushed, because I can see two tarballs for SuiteSparse and SUNDIALS (appended with the (1) in the name).
Running the unit tests with optional solvers (i.e, Jax) on ARM-based macOS machines returns an error: RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source. This should be related to using --platform linux/x86_64 when spinning up an image. We might need to build two sets of Docker images per platform to fix this (x86_64 and arm64) just like we build Linux wheels on both architectures

Sep 07 '23 17:09 agriyakhetarpal

Addendum: creating the Docker image on ARM-based machines with the JAX argument also fails, here is the error log:

ERROR: Could not find a version that satisfies the requirement jaxlib==0.4.7; extra == "jax" (from pybamm[all,dev,docs,jax,odes]) (from versions: none)
ERROR: No matching distribution found for jaxlib==0.4.7; extra == "jax"

and specifying the --platform linux/x86_64 flag with the docker build command seems to put it in a deadlock because this step in particular keeps running for about 20 minutes, which is when I stopped the process.

Sep 10 '23 14:09 agriyakhetarpal

@arjxn-py, when you have time – let me know if we can look at this task together: "Push DockerHub images on releases, starting from either 23.9 or 24.1" before the 24.1rc0 release. I feel this would be relatively easy and quick to set up, considering that we would just need to add

on:
  release:
    types: [published]

to the workflow and add a tag that contains the version number in the "Build and push Docker image" job step. In my opinion, publishing just the IDAKLU image would be fine—installing [jax] and [odes] is trivial now and they are optional dependencies too—but the IDAKLU solver is bundled with the PyPI wheels, the Docker images should follow suit.

Dec 21 '23 18:12 agriyakhetarpal

@arjxn-py, when you have time – let me know if we can look at this task together:

Yes definitely, I was also going to pick it up. Just that a bit busy with some difficult end sem exams😅, hopefully I'll be up on this weekend

Dec 21 '23 23:12 arjxn-py

Hey @agriyakhetarpal, I think i shall open up a new PR to keep it clean from my fork, you have push access so we can look into it together. Let me know accordingly.

Dec 24 '23 17:12 arjxn-py

Happy to start doing it together

Dec 24 '23 17:12 agriyakhetarpal

PyBaMM PyBaMM copied to clipboard

Improvements to the Docker installation and usage procedures

PyBaMM
PyBaMM copied to clipboard