PyBaMM icon indicating copy to clipboard operation
PyBaMM copied to clipboard

Migrated docker image from miniconda to manylinux2014

Open santacodes opened this issue 11 months ago • 7 comments

Description

Migration from conda virtual environment to venv and using Centos 7 based manylinux2014 as base image for docker which is compliant with PEP599.

Fixes #3692 and fixes #3666

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Optimization (back-end change that speeds up the code)
  • [ ] Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • [x] No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • [x] All tests pass: $ python run-tests.py --all (or $ nox -s tests)
  • [x] The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

  • [x] Code is commented, particularly in hard-to-understand areas
  • [ ] Tests added that prove fix is effective or that feature works

santacodes avatar Mar 07 '24 13:03 santacodes

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.60%. Comparing base (8056d22) to head (abc16d0).

:exclamation: Current head abc16d0 differs from pull request most recent head 5d8417b. Consider uploading reports for the commit 5d8417b to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3874      +/-   ##
===========================================
- Coverage    99.60%   99.60%   -0.01%     
===========================================
  Files          259      259              
  Lines        21273    21270       -3     
===========================================
- Hits         21189    21186       -3     
  Misses          84       84              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 07 '24 13:03 codecov[bot]

Here's a list of what else we would need from this PR, besides these changes:

  1. Update the installation guide in the documentation
  2. Adjust the GitHub Actions workflows to push the pybamm/pybamm:latest image as the one with all solvers, i.e., tag it with latest
  3. Adjust the base image being pulled in order to retain arm64 support
  4. A CHANGELOG entry in the "Breaking changes" section, since this shall be a user-facing change and affects developer installations

Also, could we get an estimate on the compressed size of the image? The ones we currently push are ~1.6 GB, so remaining close to somewhere around that would be great.

agriyakhetarpal avatar Mar 07 '24 16:03 agriyakhetarpal

Also, could we get an estimate on the compressed size of the image? The ones we currently push are ~1.6 GB, so remaining close to somewhere around that would be great.

Sure I'd try to minimize the image size as much as possible and update you on the estimate

santacodes avatar Mar 07 '24 16:03 santacodes

python base image looks much more valid as compared to manylinux, I'd request you to run test suite inside the running docker container at least once and see how many tests are getting skipped. Plus confirm the solver availability inside the container by executing:

  • pybamm.have_idaklu()
  • pybamm.have_jax()
  • pybamm.have_scikits_odes()

All of the three methods return true inside the docker container running python venv. I also did run the test suite before pushing the commit, 1 test in each unit and integration tests were getting skipped, I am not exactly sure of the cause. I am awaiting on @agriyakhetarpal response on testing it on ARM.

santacodes avatar Mar 09 '24 15:03 santacodes

Upon further researching on PEP599 the manylinux2014 platform tag which demands for the container to be compliant with GLIBC<=2.17, almost all of the linux distributions fulfilling that constraint have reached EOL, except CentOS 7 which will reach its EOL in June, 2024.

Most of the docker containers which have a constant maintenance use Debian/Ubuntu, RHEL or Alpine based distributions in their containers to maintain their integrity.

In reference to the above distributions, these are the OS releases that fulfill glibc<=2.17 -

  • Debian Wheezy which had glibc 2.13 and reached its end of life in May 2018. The packages are now archived and not constantly maintained which might cause some packages to break.
  • Ubuntu 13 Saucy which had glibc 2.17 and reached its EOL in July 2014.
  • CentOS 7 which still runs glibc 2.17 will reach EOL in June 2024 is the primary base image used by PEP599 manylinux2014 policy, but the downside of it in context to this project is multi-architecture building. Manylinux2014 provides A single build image and auditwheel profile per architecture which would mean that:
    • We would either have to use 2 different dockerfiles to build for x86_64 and aarch64 distinctly as the base images have a discrete architecture or
    • Use a multistage build to pull and push different images for each of them. This would also complicate things a bit as CentOS 7 would reach EOL soon.

To not do things the hacky way the official Python Bullseye image would make things a bit simple as the base image itself supports multiple architectures and is based on Debian. The size would also not be an issue as there's a slim version of the image and would have a constant support and maintenance for a while. Upgrading to a newer version or migrating would also become easier as the dependencies would be compatible. By default the Python Bullseye image provides glibc 2.31 which provides backward compatibility and wouldn't have any issues running in a container.

Reference

santacodes avatar Mar 09 '24 16:03 santacodes

Thanks for looking into this @santacodes – yes, a Python bullseye image makes sense too as @arjxn-py mentions. The manylinux2014 images are indeed architecture-specific implementations of PEP 599 and therefore won't have support for multiple architectures from a single FROM command. The specification for having glibc < 2.17 is just for ABI compatibility reasons, and does not need to be enforced for a Docker image distribution – it is just for building the wheels.

@santacodes when pulling this branch I noticed that these changes have been made on the develop branch on your fork, and not a feature branch. This is generally not good practice since it causes issues with syncing the repository if you wish to contribute further, and a host of other reasons.

Could you create a new PR based on these changes, and close this one? You can check out a new branch from the current changes and that should bring your commits there (or you can cherry-pick the commits too – whichever you find easier).

agriyakhetarpal avatar Mar 09 '24 16:03 agriyakhetarpal

Actually, testing this on ARM on macOS has revealed a bug in our current PyBaMM Docker images that we have been pushing to Docker Hub since some while:

ImportError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 idaklu = importlib.util.module_from_spec(idaklu_spec)

File <frozen importlib._bootstrap>:573, in module_from_spec(spec)

File <frozen importlib._bootstrap_external>:1233, in create_module(self, spec)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

ImportError: /home/pybamm/PyBaMM/pybamm/solvers/idaklu.cpython-311-aarch64-linux-gnu.so: undefined symbol: _ZN6casadi8Function11deserializeERKSs

this unresolved symbol _ZN6casadi8Function11deserializeERKSs comes from CasADi upstream (see https://github.com/casadi/casadi/issues/2887, https://github.com/dockcross/dockcross/issues/753, https://github.com/dockcross/dockcross/issues/780) and is quite challenging to fix. This error is the same as what I face when testing this new bullseye image – I remember facing this on #3462 and could not find a way towards resolution.

We can keep this PR open for a while – what I remember is that this did work on aarch64 earlier, but somewhere down the line this ended up breaking and therefore the IDAKLU solver cannot be instantiated

agriyakhetarpal avatar Mar 09 '24 18:03 agriyakhetarpal

Closed this due to messy branches on my side, will make a new draft with those previous commits for future reference.

santacodes avatar Mar 18 '24 08:03 santacodes