PyBaMM icon indicating copy to clipboard operation
PyBaMM copied to clipboard

[Bug]: IDAKLU solver does not work when running in an aarch64 Docker container (unresolved symbols from CasADi)

Open agriyakhetarpal opened this issue 11 months ago • 9 comments

PyBaMM Version

develop

Python Version

3.11

Describe the bug

See https://github.com/pybamm-team/PyBaMM/pull/3874#issuecomment-1986939495 for more

I came across this when testing the most recent Docker image for PyBaMM on Docker Hub.

Steps to Reproduce

On an arm64 (M-series) macOS machine with Docker:

  1. Pull the Docker image with docker run -it pybamm/pybamm:idaklu
  2. docker run -it pybamm/pybamm:idaklu
  3. python -c "import pybamm; pybamm.IDAKLUSolver()

displays the following:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/pybamm/PyBaMM/pybamm/solvers/idaklu_solver.py", line 118, in __init__
    raise ImportError("KLU is not installed")
ImportError: KLU is not installed

which upon further debugging reveals (see logs below)

Relevant log output

ImportError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 idaklu = importlib.util.module_from_spec(idaklu_spec)

File <frozen importlib._bootstrap>:573, in module_from_spec(spec)

File <frozen importlib._bootstrap_external>:1233, in create_module(self, spec)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

ImportError: /home/pybamm/PyBaMM/pybamm/solvers/idaklu.cpython-311-aarch64-linux-gnu.so: undefined symbol: _ZN6casadi8Function11deserializeERKSs

agriyakhetarpal avatar Mar 09 '24 18:03 agriyakhetarpal

Labelled this as a medium-priority issue because we don't have wheels for aarch64 Linux right now anyway (#3462), so users who are on such architectures are probably building PyBaMM from source already (if they wish to use the IDAKLU solver, that is).

agriyakhetarpal avatar Mar 09 '24 18:03 agriyakhetarpal

@arjxn-py, is there a reason why we are using continuumio/miniconda3:latest, i.e., the latest-tagged image? We should have ideally pinned this to a particular tag to make the build reproducible – I believe that the aarch64 image was working earlier when we were merging things (I had tested it), so I am not sure when exactly this bug has appeared and how...

agriyakhetarpal avatar Mar 09 '24 18:03 agriyakhetarpal

#3874 shall have to remain stalled until this is resolved (and we should look into this before #3666 and #3692)

agriyakhetarpal avatar Mar 09 '24 18:03 agriyakhetarpal

We did have CMake pinned (cmake==3.22) in #3223 when we pushed the images initially, but this was unpinned later – I am looking at this locally to see if reverting to that works

Edit: no luck so far with that

agriyakhetarpal avatar Mar 09 '24 19:03 agriyakhetarpal

is there a reason why we are using continuumio/miniconda3:latest, i.e., the latest-tagged image?

No such reason to use the latest tag but, I guess i haven't anticipated that this might cause an issue later on. What we can do now is try pinning to the tags 4~10 months old (there are 3-4 tags I can see). As I am not sure if i can reproduce this aarch64 error locally (maybe lack of architecture), so I'd let you know with the updated branches on my fork to test them.

arjxn-py avatar Mar 12 '24 22:03 arjxn-py

As I am not sure if i can reproduce this aarch64 error locally (maybe lack of architecture), so I'd let you know with the updated branches on my fork to test them.

Thanks, actually I did test the last three tags for the miniconda image by building it and running the container locally, and also pinned cmake==3.22 – it did not work and returned the same error. Maybe we'll need to just use an older gcc or something by grabbing it off conda, rather than through apt, and hopefully that can fix it – we can't use a version that is too old, however.

agriyakhetarpal avatar Mar 12 '24 22:03 agriyakhetarpal

Possible method of resolution, only an idea for now: build CasADi from source in the images (for both architectures). We don't need to compile interfaces to the many solvers and frameworks available, just the Python/SWIG bindings, so the build should take ~2 minutes – a fine compromise.

Linux source installation instructions: https://github.com/casadi/casadi/wiki/InstallationLinux are quite actively documented and updated

agriyakhetarpal avatar Apr 02 '24 03:04 agriyakhetarpal

Possible method of resolution, only an idea for now: build CasADi from source in the images (for both architectures). We don't need to compile interfaces to the many solvers and frameworks available, just the Python/SWIG bindings, so the build should take ~2 minutes – a fine compromise.

Linux source installation instructions: https://github.com/casadi/casadi/wiki/InstallationLinux are quite actively documented and updated

Might as well try that. If it's not already been worked on by someone, I'd love to try this PoC.

santacodes avatar Apr 06 '24 01:04 santacodes

Sure, we would love the help, @santacodes!

agriyakhetarpal avatar Apr 06 '24 08:04 agriyakhetarpal