Migrate to `scikit-build-core`
Description
This PR is about migrating to a new build system, scikit-build-core from setuptools and wheel, while incorporating necessary changes in GitHub Workflows, Benchmarks and Documentation.
CHANGES IN DISCUSSION:
- [x] Task 27: Use SUNDIALS installation script to set the environmental variables for windows --- We currently don't support Windows users to run
nox -s pybamm-requiresorinstall_KLU_Sundials.pyscript. Adding this feature would change a lot of things
CHANGES REQUESTED:
- [x] Task 22: Remove images from
.gitignore - [x] Task 23: Include the whole
sundials_KLU_libsin.gitignore - [x] Task 24: Remove import sys from bottom call in test_idaklu_solver
- [x] Task 25: Skip the parameter_sets test
- [x] Task 26: Change the version requirements given the scikit-build-core docs
- [x] Task 28: Update docs for DAE solver (remove extra text from the tabs)
CHANGES IN PROGRESS:
COMPLETED, need reviews:
- [x] Task 4: Remove pins on other build-time dependencies like CasADi --- unpinned all instances of build-time dependencies, including
scikit-build-corein the direction to remove editable-rebuilds, or at least give a warning to the users before using it. - [x] Task 5: Remove
pip updatecommands fromnox. - [x] Task 13: Remove
set CMAKE_GENERATOR="Visual Studio 17 2022"andset CMAKE_GENERATOR_PLATFORM=x64and let CMake set it by itself. - [x] Task 14: Set defaults for Windows for the remaining 4 environment variables in
CMakeLists.txt. Use "if set, use that otherwise use default" logic forVCPKG_ROOT(renameVCPKG_ROOT_DIRtoVCPKG_ROOT) --- Can not fix using CMake variables, have to set environment variables, hence wrote a batch script to set the env vars permanently for the Windows user. The CI still use pre-defined env vars because this avoids issues related to interactive scripts. - [x] Task 19: Clean the
find BLASscript - [x] Task 3: Run tests, including IDAKLU ones, on all platforms (except doctests). --- random windows IDAKLU tests fail because of workers crashes. IREE tests had to be skipped because of attribute failure. Rest linux and macos tests are working.
- [x] Task 1: Check if the error in Doctests
(pybamm.parameter_sets)can be resolved using an example instead of explicitly setting all the names. --- need to set all names - [x] I unpinned
scikit-build-corebut will keep it as an option in the docs. I will write a warning saying that it is experimental.- [x] Task 2: Set the scikit-build-core pin to 0.10.5 from 0.10.3
- [x] Task 6: Write a clearer message about editable-rebuild command, which is supposed to help developers using partial rebuilds (it was written to use that command to contribute to PyBaMM).
- [x] Task 7: Remove the verbose flag from the editable-rebuilds command
- [x] Task 12: Set
python/python3instead ofpython3.Xin TOML kit installation in the guide. - [x] Task 15: Remove ambiguity from
# in the PyBaMM/ directory - [x] Task 16: Migrate all Windows instructions to the existing
Install from sourcedoc. - [x] Task 20: Add
pytestdocumentation from Contributors guide in Develop branch. - [x] Task 10: Include new "build" and "download_KLU_Sundials" directories in
.gitignore - [x] Task 18: Include
*.egg-infopattern in.gitignore - [x] Task 9: Update docs asking users to install VCPKG themselves.
- [x] Task 8: Remove images from the installation guide.
- [x] Task 11: Install VCPKG in a local directory
- [x] Task 17: We do not need the MSVC image
- [x] Task 18: Unpin
graphvizfrom docs and tests --- had to pin on tests because ofpermissionerrors, though it would work fine locally. - [X] Task 21: Add IREE Support
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.48%. Comparing base (
3bf3ea8) to head (a51b095). Report is 7 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #4352 +/- ##
===========================================
- Coverage 99.42% 95.48% -3.94%
===========================================
Files 299 299
Lines 22691 22715 +24
===========================================
- Hits 22560 21690 -870
- Misses 131 1025 +894
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
With this commit, editable + partial rebuilds are available through nox -s dev.
Note that we have to export BUILD_IDAKLU=ON to enable CMake and thus IDAKLU build
Updates till now:
- Wheels are built properly, and the package can be installed simply with
pip install .[...]. - CI Builds pass except for macOS, which can't find
Fortran(which is actually installed before the build step) to findSuiteSparselibraries andBLAS. This shouldn't be a problem on local ifgfortranis installed throughbrew. IDAKLUis compiled properly and is installed during wheel builds and installation, given we have exportedBUILD_IDAKLU=ONbefore running the build/install commands. Otherwise, a pure python package is installed.- Editable installs with rebuilds are working, and can be run using
pip install --no-build-isolation --config-settings=editable.rebuild=true -Cbuild-dir=build -ve.[...], provided we have the build-time dependencies, such ascmake,pybind11etc., are installed beforehand. - This is automated by running
nox -s devwhich installs the build-time dependencies. Note that this feature requirestomlkitto be installed before running anynoxcommand. Hence it is better to installnoxandtomlkithand in hand. - It is to be noted that with editable installs and rebuilds, the
IDAKLUsolver does not compile with changes inSuiteSparseandSUNDIALSfiles. This is becauseSuiteSparseandSUNDIALSaren't interfaced withpybind11directly, and thus are not linked with the compiled artifact. Apart from this, the editable installs work properly with workingIDAKLUsolver. - Docker builds are working, both locally and on CI.
- Doctests and other tests should work now since
noxandtomlkitare installed together on CI.
Problem with Benchmarks:
- The installation script installs the libs to
PyBaMM/sundials_KLU_libs. - CMake links them using
${CMAKE_CURRENT_SOURCE_DIR}/sundials_KLU_libs. - The thing with ASV on CI is that the script responsible for installing the packages installs them in
/home/runner/work/PyBaMM/PyBaMM/sundials_KLU_libs/. - But due to the build command:
python -m build --wheel -o {build_cache_dir} {build_dir}, it makes a separate venv+pip environment, and the path to the libs is changed to/home/runner/work/PyBaMM/PyBaMM/env/ec88222d46a18b5e9d12ba7194d09478/project/sundials_KLU_libs
Because of this, CMake isn't finding the libs and is failing to build them.
Updates: With this commit, the following things are covered:
- Main migration to
scikit-build-corewithIDAKLUturned off as default. This is overridden on CI jobs and by exportingBUILD_IDAKLU=ON - Editable installs are now available with
nox -s dev, which utilisestomlkitto install the build-time dependencies - CI builds are working for all three platforms
- Docker builds are updated and working
- Doctests, tests and benchmarks are working
Okay, now they work :)
After setting the compilation of the IDAKLU solver to default and providing a fallback, I ran workflows of tests, benchmarks, CI wheels' builds, and docker builds, and they all worked.
Please note that I set BUILD_IDAKLU=OFF for tests where IDAKLU wasn't needed and fails the builds if set to ON.
With this commit, documentation is complete for Install from source: Windows with improvements in other documentation.
I am not sure if the pre-commit's blacken-docs error relates to the changes I made.
EDIT: Found the problem
The error (warning) in doctests is a quick fix: two links have the same name. I will change it as soon as all tests are okay.
Seems like there is a problem upstream for scipy==1.14.1, I am pinning it to 1.14.0
It was the case of a missing M-series wheel for Python 3.12, which seems to have been fixed since the upload was processed manually. Could you unpin and check, @cringeyburger?
Thanks @kratman for reviewing the PR, especially for the documentation changes that have been added. Ankit is wrapping up his project and needs to write his final report for submission, so I'll try to find some time and prioritise reviewing this tomorrow and over the weekend.
On that note, @santacodes finished up his project today and is now looking to start with the IDAKLU solver separation milestone once he publishes his final report. @valentinsulzer and I were discussing with him some of the specifics around this in our meeting today, and whether it could be done in the course of a week or before we start with the v24.9 release cycle, which is not only subject to Santhosh's and Ankit's capabilities, but also to this PR getting merged, so the sooner it will be possible to, the better
On that note, @santacodes finished up his project today and is now looking to start with the IDAKLU solver separation milestone once he publishes his final report. @valentinsulzer and I were discussing with him some of the specifics around this in our meeting today, and whether it could be done in the course of a week or before we start with the v24.9 release cycle, which is not only subject to Santhosh's and Ankit's capabilities, but also to this PR getting merged, so the sooner it will be possible to, the better
Yeah that is why I was looking at this PR. To help with the separation task.
Due to the large number of conversations, it is becoming harder to track all the changes requested, changes in discussion and progress. To make it easier, I will be updating the first/top PR message after each update.
@agriyakhetarpal Is this supposed to happen? I tried running windows test with IDAKLU on.
test_symbol_visualise can be skipped, could you add a pytest condition there to skip it if the latex program is not found? You can use shutil.which() to find it. As for the IDAKLU-JAX test, I'm not too surprised, because we haven't run the tests against the solver on PRs before – I knew there would be some failures coming up. I would suggest running them again, and if they keep failing, we can mark them with pytest.mark.xfail and fix them later, since they aren't related to the build system migration.
@agriyakhetarpal Check this too. IREE is only for MacOS 13 and higher + python 3.11, and it is the only failing test. I was thinking of setting PYBAMM_IDAKLU_EXPR_IREE=OFF, i.e. by editing it in noxfile for time being till this is fixed
EDIT:
test_symbol_visualisecan be skipped...
I found out that it was a problem with using the latest version of graphviz on windows. I tried running it using 8.0.5, and it works without a problem.
As for the IDAKLU-JAX test...
I tried running them 4+ times. I think I will mark them with pytest.mark.xfail
From this job, it is known that the IDAKLU tests failing on windows are highly inconsistent. Different tests fail at different runs. It would be better to skip the IDAKLU build on Windows for now and tackle it later.
As for IREE, they were consistent so I wrote conditionals with xfail
@agriyakhetarpal @kratman, with this commit, all the requested changes have been made. It would be great if you could review this PR again.
Please note that before merging, I need to change the git clone to upstream in Dockerfile and benchmarks (and maybe some other files).
Hmm... IREE tests are passing in CI builds but failing in unit tests. I have to take a look at this.
EDIT: I found out the problem.
In CI builds, we are not setting PYBAMM_IDAKLU_EXPR_IREE: ON like in the test_on_push workflow.
Hence, during the pybamm-requires session, the IREE repo and submodules are not installed in CI builds but are installed for tests.
And during the nox -s unit, [iree] is not installed for CI builds tests but for test_on_push.
Therefore, everything passes in the tests for CI builds, but for test_on_push, it doesn't, which is ironic since it should be reversed (even though CI builds are not cloning IREE and other submodules, it passes all tests).
Now, I am confused. Do I fix the CI builds tests by adding PYBAMM_IDAKLU_EXPR_IREE: ON and let xfail tests be? Or do I remove PYBAMM_IDAKLU_EXPR_IREE: ON from the test_on_push and remove xfail conditions from the failing tests?
I think adding PYBAMM_IDAKLU_EXPR_IREE: ON to CI builds tests and having expected failing tests is better than removing the functionality itself.
CC: @agriyakhetarpal @kratman
Just fixed the IREE issue. The PR is up for review (again). I hope I covered all the important things. @agriyakhetarpal @kratman
The build failure is due to some network problem (don't know how). It will work if run again.
I triggered https://github.com/pybamm-team/PyBaMM/actions/runs/10577390897/job/29305220084 from the develop branch, let's see if the Windows wheel repair step fails there, too.
Also, @kratman reported some troubles when installing PyBaMM from this branch on macOS. In his words, "It looks like it cannot find whatever library defines 'realtype' and 'booleantype'"
I'm not sure about that because I don't have the same issue on my macOS machine because everything compiles successfully after nox -s pybamm-requires and nox -s dev, but I do have a different issue though – the idaklu shared object isn't placed in the src/pybamm/solvers/ directory after installation for some reason (I did verify by changing the current pybamm/solvers/ to src/pybamm/solvers/ in CMakeLists.txt but to no avail). The shared object is placed in build/{wheel_tag}, but not copied into the PyBaMM directory (therefore the IDAKLU solver tests are skipped). I am on an M-series (ARM) device if that helps.
Okay, I found out that either doing pip install ... or nox -s dev installed the solver in site packages, somewhere here /home/yukinatsu/PyBaMM/venv/lib/python3.11/site-packages/pybamm/solvers/idaklu.cpython-311-x86_64-linux-gnu.so, which is different for different virtual envs.
I also remember the exact moment I changed this (I was trying to fix the BLAS not found problem, so I was trying different installation sites, then I wrote the BLAS script in FindSuiteSparse.cmake and forgot about it).
I fixed it with the following commit, now it installs within the src/pybamm/solvers. I also tested using pytest -m unit and nox -s unit and they worked like they before.
I don't know how IDAKLU tests were skipped for you.
Even without the solver being installed in src/pybamm/solvers, running pip install ... with pytest -m unit will work perfectly. And nox -s dev followed by, or just nox -s unit will work perfectly.
It will skip tests if you did nox -s dev and tried to do pytest -m unit without being in the virtual environment created by nox, since pybamm isn't installed locally (also note that pybamm might be installed in your local/conda env from previous attempts, though not perfectly).
It will work if you are in venv virtual environment (created during nox -s dev) and did pytest -m unit (tested it).
Under a self-made virtual environment (name it as venv inside the project root because if nox commands are later run, it will use the same env), using manual commands like pip install .[all,dev,jax] or pip install -e .[all,dev,jax] followed by pytest -m unit works.
nox -s pybamm-requires is common for both venv and self-made env, as it just downloads and installs in the project root dir.
NOTE: Remove build before running commands to get full on fresh compilation
I just confirmed that the wheel builds and repairs on macOS locally with BUILD_IDAKLU="ON" python -m build --wheel --no-isolation and then delocate-wheel -v dist/*whl are fine when all of the libs are stored in sundials_KLU_libs/, which is great.
The problem is somewhere with the editable installation, and the command
BUILD_IDAKLU="ON" pip install --no-build-isolation -Cbuild-dir=build -ve.
copies the generated shared object (idaklu.cpython-312-darwin.so) for me in PyBaMM/venv/lib/python3.12/site-packages/pybamm/solvers/, and not in PyBaMM/src/pybamm/solvers – which is why it isn't found. This is the case for both scikit-build-core 0.9.10 and 0.10.5, so it either isn't an upstream issue, or it is but hasn't been discovered, or we are using an incorrect mechanism to load the shared object in pybamm.have_idaklu().
Non-editable installations (without -e flag) work fine.
Edit: just noticed you put your comment when I was writing mine, we mostly said the same thing.
Thanks, https://github.com/pybamm-team/PyBaMM/pull/4352/commits/c561ebf7882570824db1f003312ae3f302895119 fixes it!
Ideally, the build system should set CMAKE_CURRENT_SOURCE_DIR for us; it's weird that it doesn't... anyway, I'll continue testing a bit further locally and get back to you if there are any problems. Let's hope @kratman has just a Python installation issue or something up with his compiler, and not something that would be too difficult to debug.
Also, I noticed that the repair step in the develop branch with Windows CI failed like it did with mine.
NOTE: now with the current fix, all CI builds fail again :(
Error is something like this:
+ sh -c 'auditwheel repair -w /tmp/cibuildwheel/repaired_wheel /tmp/cibuildwheel/built_wheel/pybamm-24.5-cp39-cp39-linux_x86_64.whl' INFO:auditwheel.main_repair:Repairing pybamm-24.5-cp39-cp39-linux_x86_64.whl INFO:auditwheel.main_repair:This does not look like a platform wheel, no ELF executable or shared library file (including compiled Python C extension) found in the wheel archive
The solver is installing in the project dir solvers, instead of the wheel's solvers. Need to set a conditional to fix it (this is disappointing).
Note that, with this commit, the wheel built locally won't have the solver in it, since no env vars are set during local wheel builds to keep the condition, so delocate will fail