PyBaMM icon indicating copy to clipboard operation
PyBaMM copied to clipboard

Migrate to `scikit-build-core`

Open cringeyburger opened this issue 1 year ago • 50 comments

Description

This PR is about migrating to a new build system, scikit-build-core from setuptools and wheel, while incorporating necessary changes in GitHub Workflows, Benchmarks and Documentation.

CHANGES IN DISCUSSION:

  • [x] Task 27: Use SUNDIALS installation script to set the environmental variables for windows --- We currently don't support Windows users to run nox -s pybamm-requires or install_KLU_Sundials.py script. Adding this feature would change a lot of things

CHANGES REQUESTED:

  • [x] Task 22: Remove images from .gitignore
  • [x] Task 23: Include the whole sundials_KLU_libs in .gitignore
  • [x] Task 24: Remove import sys from bottom call in test_idaklu_solver
  • [x] Task 25: Skip the parameter_sets test
  • [x] Task 26: Change the version requirements given the scikit-build-core docs
  • [x] Task 28: Update docs for DAE solver (remove extra text from the tabs)

CHANGES IN PROGRESS:

COMPLETED, need reviews:

  • [x] Task 4: Remove pins on other build-time dependencies like CasADi --- unpinned all instances of build-time dependencies, including scikit-build-core in the direction to remove editable-rebuilds, or at least give a warning to the users before using it.
  • [x] Task 5: Remove pip update commands from nox.
  • [x] Task 13: Remove set CMAKE_GENERATOR="Visual Studio 17 2022" and set CMAKE_GENERATOR_PLATFORM=x64 and let CMake set it by itself.
  • [x] Task 14: Set defaults for Windows for the remaining 4 environment variables in CMakeLists.txt. Use "if set, use that otherwise use default" logic for VCPKG_ROOT (rename VCPKG_ROOT_DIR to VCPKG_ROOT) --- Can not fix using CMake variables, have to set environment variables, hence wrote a batch script to set the env vars permanently for the Windows user. The CI still use pre-defined env vars because this avoids issues related to interactive scripts.
  • [x] Task 19: Clean the find BLAS script
  • [x] Task 3: Run tests, including IDAKLU ones, on all platforms (except doctests). --- random windows IDAKLU tests fail because of workers crashes. IREE tests had to be skipped because of attribute failure. Rest linux and macos tests are working.
  • [x] Task 1: Check if the error in Doctests (pybamm.parameter_sets) can be resolved using an example instead of explicitly setting all the names. --- need to set all names
  • [x] I unpinned scikit-build-core but will keep it as an option in the docs. I will write a warning saying that it is experimental.
    • [x] Task 2: Set the scikit-build-core pin to 0.10.5 from 0.10.3
    • [x] Task 6: Write a clearer message about editable-rebuild command, which is supposed to help developers using partial rebuilds (it was written to use that command to contribute to PyBaMM).
    • [x] Task 7: Remove the verbose flag from the editable-rebuilds command
  • [x] Task 12: Set python/python3 instead of python3.X in TOML kit installation in the guide.
  • [x] Task 15: Remove ambiguity from # in the PyBaMM/ directory
  • [x] Task 16: Migrate all Windows instructions to the existing Install from source doc.
  • [x] Task 20: Add pytest documentation from Contributors guide in Develop branch.
  • [x] Task 10: Include new "build" and "download_KLU_Sundials" directories in .gitignore
  • [x] Task 18: Include *.egg-info pattern in .gitignore
  • [x] Task 9: Update docs asking users to install VCPKG themselves.
    • [x] Task 8: Remove images from the installation guide.
    • [x] Task 11: Install VCPKG in a local directory
    • [x] Task 17: We do not need the MSVC image
  • [x] Task 18: Unpin graphviz from docs and tests --- had to pin on tests because of permission errors, though it would work fine locally.
  • [X] Task 21: Add IREE Support

cringeyburger avatar Aug 16 '24 14:08 cringeyburger

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.48%. Comparing base (3bf3ea8) to head (a51b095). Report is 7 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4352      +/-   ##
===========================================
- Coverage    99.42%   95.48%   -3.94%     
===========================================
  Files          299      299              
  Lines        22691    22715      +24     
===========================================
- Hits         22560    21690     -870     
- Misses         131     1025     +894     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 16 '24 14:08 codecov[bot]

With this commit, editable + partial rebuilds are available through nox -s dev. Note that we have to export BUILD_IDAKLU=ON to enable CMake and thus IDAKLU build

cringeyburger avatar Aug 17 '24 00:08 cringeyburger

Updates till now:

  1. Wheels are built properly, and the package can be installed simply with pip install .[...].
  2. CI Builds pass except for macOS, which can't find Fortran (which is actually installed before the build step) to find SuiteSparse libraries and BLAS. This shouldn't be a problem on local if gfortran is installed through brew.
  3. IDAKLU is compiled properly and is installed during wheel builds and installation, given we have exported BUILD_IDAKLU=ON before running the build/install commands. Otherwise, a pure python package is installed.
  4. Editable installs with rebuilds are working, and can be run using pip install --no-build-isolation --config-settings=editable.rebuild=true -Cbuild-dir=build -ve.[...], provided we have the build-time dependencies, such as cmake, pybind11 etc., are installed beforehand.
  5. This is automated by running nox -s dev which installs the build-time dependencies. Note that this feature requires tomlkit to be installed before running any nox command. Hence it is better to install nox and tomlkit hand in hand.
  6. It is to be noted that with editable installs and rebuilds, the IDAKLU solver does not compile with changes in SuiteSparse and SUNDIALS files. This is because SuiteSparse and SUNDIALS aren't interfaced with pybind11 directly, and thus are not linked with the compiled artifact. Apart from this, the editable installs work properly with working IDAKLU solver.
  7. Docker builds are working, both locally and on CI.
  8. Doctests and other tests should work now since nox and tomlkit are installed together on CI.

cringeyburger avatar Aug 17 '24 11:08 cringeyburger

Problem with Benchmarks:

  1. The installation script installs the libs to PyBaMM/sundials_KLU_libs.
  2. CMake links them using ${CMAKE_CURRENT_SOURCE_DIR}/sundials_KLU_libs.
  3. The thing with ASV on CI is that the script responsible for installing the packages installs them in /home/runner/work/PyBaMM/PyBaMM/sundials_KLU_libs/.
  4. But due to the build command: python -m build --wheel -o {build_cache_dir} {build_dir}, it makes a separate venv+pip environment, and the path to the libs is changed to /home/runner/work/PyBaMM/PyBaMM/env/ec88222d46a18b5e9d12ba7194d09478/project/sundials_KLU_libs

Because of this, CMake isn't finding the libs and is failing to build them.

cringeyburger avatar Aug 17 '24 13:08 cringeyburger

Updates: With this commit, the following things are covered:

  1. Main migration to scikit-build-core with IDAKLU turned off as default. This is overridden on CI jobs and by exporting BUILD_IDAKLU=ON
  2. Editable installs are now available with nox -s dev, which utilises tomlkit to install the build-time dependencies
  3. CI builds are working for all three platforms
  4. Docker builds are updated and working
  5. Doctests, tests and benchmarks are working

cringeyburger avatar Aug 18 '24 14:08 cringeyburger

Okay, now they work :)

cringeyburger avatar Aug 18 '24 17:08 cringeyburger

After setting the compilation of the IDAKLU solver to default and providing a fallback, I ran workflows of tests, benchmarks, CI wheels' builds, and docker builds, and they all worked. Please note that I set BUILD_IDAKLU=OFF for tests where IDAKLU wasn't needed and fails the builds if set to ON.

cringeyburger avatar Aug 18 '24 21:08 cringeyburger

With this commit, documentation is complete for Install from source: Windows with improvements in other documentation.

cringeyburger avatar Aug 20 '24 19:08 cringeyburger

I am not sure if the pre-commit's blacken-docs error relates to the changes I made.

EDIT: Found the problem

cringeyburger avatar Aug 20 '24 19:08 cringeyburger

The error (warning) in doctests is a quick fix: two links have the same name. I will change it as soon as all tests are okay.

cringeyburger avatar Aug 20 '24 21:08 cringeyburger

Seems like there is a problem upstream for scipy==1.14.1, I am pinning it to 1.14.0

cringeyburger avatar Aug 21 '24 05:08 cringeyburger

It was the case of a missing M-series wheel for Python 3.12, which seems to have been fixed since the upload was processed manually. Could you unpin and check, @cringeyburger?

agriyakhetarpal avatar Aug 21 '24 09:08 agriyakhetarpal

Thanks @kratman for reviewing the PR, especially for the documentation changes that have been added. Ankit is wrapping up his project and needs to write his final report for submission, so I'll try to find some time and prioritise reviewing this tomorrow and over the weekend.

On that note, @santacodes finished up his project today and is now looking to start with the IDAKLU solver separation milestone once he publishes his final report. @valentinsulzer and I were discussing with him some of the specifics around this in our meeting today, and whether it could be done in the course of a week or before we start with the v24.9 release cycle, which is not only subject to Santhosh's and Ankit's capabilities, but also to this PR getting merged, so the sooner it will be possible to, the better

agriyakhetarpal avatar Aug 22 '24 23:08 agriyakhetarpal

On that note, @santacodes finished up his project today and is now looking to start with the IDAKLU solver separation milestone once he publishes his final report. @valentinsulzer and I were discussing with him some of the specifics around this in our meeting today, and whether it could be done in the course of a week or before we start with the v24.9 release cycle, which is not only subject to Santhosh's and Ankit's capabilities, but also to this PR getting merged, so the sooner it will be possible to, the better

Yeah that is why I was looking at this PR. To help with the separation task.

kratman avatar Aug 22 '24 23:08 kratman

Due to the large number of conversations, it is becoming harder to track all the changes requested, changes in discussion and progress. To make it easier, I will be updating the first/top PR message after each update.

cringeyburger avatar Aug 23 '24 15:08 cringeyburger

@agriyakhetarpal Is this supposed to happen? I tried running windows test with IDAKLU on.

cringeyburger avatar Aug 24 '24 08:08 cringeyburger

test_symbol_visualise can be skipped, could you add a pytest condition there to skip it if the latex program is not found? You can use shutil.which() to find it. As for the IDAKLU-JAX test, I'm not too surprised, because we haven't run the tests against the solver on PRs before – I knew there would be some failures coming up. I would suggest running them again, and if they keep failing, we can mark them with pytest.mark.xfail and fix them later, since they aren't related to the build system migration.

agriyakhetarpal avatar Aug 24 '24 14:08 agriyakhetarpal

@agriyakhetarpal Check this too. IREE is only for MacOS 13 and higher + python 3.11, and it is the only failing test. I was thinking of setting PYBAMM_IDAKLU_EXPR_IREE=OFF, i.e. by editing it in noxfile for time being till this is fixed

EDIT:

test_symbol_visualise can be skipped...

I found out that it was a problem with using the latest version of graphviz on windows. I tried running it using 8.0.5, and it works without a problem.

As for the IDAKLU-JAX test...

I tried running them 4+ times. I think I will mark them with pytest.mark.xfail

cringeyburger avatar Aug 24 '24 14:08 cringeyburger

From this job, it is known that the IDAKLU tests failing on windows are highly inconsistent. Different tests fail at different runs. It would be better to skip the IDAKLU build on Windows for now and tackle it later.

As for IREE, they were consistent so I wrote conditionals with xfail

cringeyburger avatar Aug 24 '24 20:08 cringeyburger

@agriyakhetarpal @kratman, with this commit, all the requested changes have been made. It would be great if you could review this PR again.

Please note that before merging, I need to change the git clone to upstream in Dockerfile and benchmarks (and maybe some other files).

cringeyburger avatar Aug 25 '24 08:08 cringeyburger

Hmm... IREE tests are passing in CI builds but failing in unit tests. I have to take a look at this.

EDIT: I found out the problem. In CI builds, we are not setting PYBAMM_IDAKLU_EXPR_IREE: ON like in the test_on_push workflow. Hence, during the pybamm-requires session, the IREE repo and submodules are not installed in CI builds but are installed for tests. And during the nox -s unit, [iree] is not installed for CI builds tests but for test_on_push.

Therefore, everything passes in the tests for CI builds, but for test_on_push, it doesn't, which is ironic since it should be reversed (even though CI builds are not cloning IREE and other submodules, it passes all tests).

Now, I am confused. Do I fix the CI builds tests by adding PYBAMM_IDAKLU_EXPR_IREE: ON and let xfail tests be? Or do I remove PYBAMM_IDAKLU_EXPR_IREE: ON from the test_on_push and remove xfail conditions from the failing tests?

I think adding PYBAMM_IDAKLU_EXPR_IREE: ON to CI builds tests and having expected failing tests is better than removing the functionality itself.

CC: @agriyakhetarpal @kratman

cringeyburger avatar Aug 25 '24 09:08 cringeyburger

Just fixed the IREE issue. The PR is up for review (again). I hope I covered all the important things. @agriyakhetarpal @kratman

cringeyburger avatar Aug 25 '24 22:08 cringeyburger

The build failure is due to some network problem (don't know how). It will work if run again.

cringeyburger avatar Aug 26 '24 18:08 cringeyburger

I triggered https://github.com/pybamm-team/PyBaMM/actions/runs/10577390897/job/29305220084 from the develop branch, let's see if the Windows wheel repair step fails there, too.

Also, @kratman reported some troubles when installing PyBaMM from this branch on macOS. In his words, "It looks like it cannot find whatever library defines 'realtype' and 'booleantype'"

I'm not sure about that because I don't have the same issue on my macOS machine because everything compiles successfully after nox -s pybamm-requires and nox -s dev, but I do have a different issue though – the idaklu shared object isn't placed in the src/pybamm/solvers/ directory after installation for some reason (I did verify by changing the current pybamm/solvers/ to src/pybamm/solvers/ in CMakeLists.txt but to no avail). The shared object is placed in build/{wheel_tag}, but not copied into the PyBaMM directory (therefore the IDAKLU solver tests are skipped). I am on an M-series (ARM) device if that helps.

agriyakhetarpal avatar Aug 27 '24 11:08 agriyakhetarpal

Okay, I found out that either doing pip install ... or nox -s dev installed the solver in site packages, somewhere here /home/yukinatsu/PyBaMM/venv/lib/python3.11/site-packages/pybamm/solvers/idaklu.cpython-311-x86_64-linux-gnu.so, which is different for different virtual envs. I also remember the exact moment I changed this (I was trying to fix the BLAS not found problem, so I was trying different installation sites, then I wrote the BLAS script in FindSuiteSparse.cmake and forgot about it).

I fixed it with the following commit, now it installs within the src/pybamm/solvers. I also tested using pytest -m unit and nox -s unit and they worked like they before.


I don't know how IDAKLU tests were skipped for you.

Even without the solver being installed in src/pybamm/solvers, running pip install ... with pytest -m unit will work perfectly. And nox -s dev followed by, or just nox -s unit will work perfectly.

It will skip tests if you did nox -s dev and tried to do pytest -m unit without being in the virtual environment created by nox, since pybamm isn't installed locally (also note that pybamm might be installed in your local/conda env from previous attempts, though not perfectly).

It will work if you are in venv virtual environment (created during nox -s dev) and did pytest -m unit (tested it).

Under a self-made virtual environment (name it as venv inside the project root because if nox commands are later run, it will use the same env), using manual commands like pip install .[all,dev,jax] or pip install -e .[all,dev,jax] followed by pytest -m unit works.

nox -s pybamm-requires is common for both venv and self-made env, as it just downloads and installs in the project root dir.

NOTE: Remove build before running commands to get full on fresh compilation

cringeyburger avatar Aug 27 '24 12:08 cringeyburger

I just confirmed that the wheel builds and repairs on macOS locally with BUILD_IDAKLU="ON" python -m build --wheel --no-isolation and then delocate-wheel -v dist/*whl are fine when all of the libs are stored in sundials_KLU_libs/, which is great.

The problem is somewhere with the editable installation, and the command

BUILD_IDAKLU="ON" pip install --no-build-isolation -Cbuild-dir=build -ve.

copies the generated shared object (idaklu.cpython-312-darwin.so) for me in PyBaMM/venv/lib/python3.12/site-packages/pybamm/solvers/, and not in PyBaMM/src/pybamm/solvers – which is why it isn't found. This is the case for both scikit-build-core 0.9.10 and 0.10.5, so it either isn't an upstream issue, or it is but hasn't been discovered, or we are using an incorrect mechanism to load the shared object in pybamm.have_idaklu().

Non-editable installations (without -e flag) work fine.

Edit: just noticed you put your comment when I was writing mine, we mostly said the same thing.

agriyakhetarpal avatar Aug 27 '24 12:08 agriyakhetarpal

Thanks, https://github.com/pybamm-team/PyBaMM/pull/4352/commits/c561ebf7882570824db1f003312ae3f302895119 fixes it!

agriyakhetarpal avatar Aug 27 '24 12:08 agriyakhetarpal

Ideally, the build system should set CMAKE_CURRENT_SOURCE_DIR for us; it's weird that it doesn't... anyway, I'll continue testing a bit further locally and get back to you if there are any problems. Let's hope @kratman has just a Python installation issue or something up with his compiler, and not something that would be too difficult to debug.

agriyakhetarpal avatar Aug 27 '24 13:08 agriyakhetarpal

Also, I noticed that the repair step in the develop branch with Windows CI failed like it did with mine.

NOTE: now with the current fix, all CI builds fail again :( Error is something like this:
+ sh -c 'auditwheel repair -w /tmp/cibuildwheel/repaired_wheel /tmp/cibuildwheel/built_wheel/pybamm-24.5-cp39-cp39-linux_x86_64.whl' INFO:auditwheel.main_repair:Repairing pybamm-24.5-cp39-cp39-linux_x86_64.whl INFO:auditwheel.main_repair:This does not look like a platform wheel, no ELF executable or shared library file (including compiled Python C extension) found in the wheel archive

The solver is installing in the project dir solvers, instead of the wheel's solvers. Need to set a conditional to fix it (this is disappointing).

cringeyburger avatar Aug 27 '24 13:08 cringeyburger

Note that, with this commit, the wheel built locally won't have the solver in it, since no env vars are set during local wheel builds to keep the condition, so delocate will fail

cringeyburger avatar Aug 27 '24 13:08 cringeyburger