numcodecs
numcodecs copied to clipboard
Add out-of-tree Pyodide builds in CI for `numcodecs`
This PR adds a CI job (emscripten-ci.yaml) that builds WASM wheels for numcodecs for use in Pyodide-based environments using the Emscripten toolchain. This was discussed in one of the community meetings for Zarr where it was said that this change would be welcome as a pull request, and follow-up work after this is complete is planned:
- A job that publishes nighty WASM wheels for numcodecs to a PyPI-like distribution channel (not sure where, this might require requesting access on the https://github.com/scientific-python/upload-nightly-action repository (which corresponds to https://anaconda.org/scientific-python-nightly-wheels/)
- A corresponding job for out-of-tree testing for Zarr that builds up on this job (and subsequently its nightly pushes)
- Interactive documentation improvements with Sphinx that make use of these nightly wheels in some form (longer-term goal, soon but not very soon I guess)
TODO:
- [ ] Unit tests and/or doctests in docstrings
- [ ] Tests pass locally
- [ ] Docstrings and API docs for any new/modified user-facing classes and functions
- [ ] Changes documented in docs/release.rst
- [ ] Docs build locally
- [ ] GitHub Actions CI passes
- [ ] Test coverage to 100% (Codecov passes)
Additional context
Some xrefs for reviewers and project maintainers:
- NumPy's out-of-tree job: https://github.com/numpy/numpy/blob/main/.github/workflows/emscripten.yml and corresponding PR: https://github.com/numpy/numpy/pull/25894
- The same for PyWavelets (optional dependency for
scikit-image): https://github.com/PyWavelets/pywt/pull/701 and https://github.com/PyWavelets/pywt/pull/714 - For the
pandasrepository https://github.com/pandas-dev/pandas/pull/57896 - https://github.com/scikit-image/scikit-image/pull/7350
- and so on
The workflow is currently going to fail because the file system access in Emscripten has a bug with saving .npy files – I had started on this PR roughly two weeks back, and I have now rebased on top of the changes from the main branch. Here is the related issue that I opened on the Pyodide repository a few hours ago: https://github.com/pyodide/pyodide/issues/4779
@rgommers and I had fixed a similar error in https://github.com/PyWavelets/pywt/pull/701 by converting all of the .npy files present to .npz files. If this is a suitable workaround at this time and does not render other PRs un-mergeable, please let me know and I would be happy to apply this fix here (it works/worked for loading .npz files, it should most likely work for saving them as well).
Thanks, @agriyakhetarpal. I've launched the workflows.
Ah, I think the miniconda action needs updating. I'll push a commit to fix that in a moment.
Never mind, looks like #528 is already on it – I can wait and rebase later after that gets merged. For the relevant jobs here, the Pyodide wheel build succeeds while the tests fail because of the reasons described above, and the rest of the test suite passes.
Marking this PR as a draft till there is a fix for this on the Pyodide and Emscripten ends. I will put my focus on an accompanying CI job and associated fixes for the Zarr repository.
It's been a while since this PR has been opened, could we get the ball rolling by temporarily skipping the previously failing test for now? I feel that numcodecs's CI can benefit from adding this workflow for the rest of the test suite, at least. I've revisited the merge conflicts in the previous few commits and updated the workflow file in 764f34e5c0fd1c31a5316af760e3fc1848e28410, I'll see how the builds go.
I've created https://github.com/pyodide/pyodide/pull/5169 and https://github.com/pyodide/pyodide/pull/5172 (please see above) for adding some missing dependencies to Pyodide. As a temporary measure, I've skipped the sole test that requires an updated version of the crc32c package: numcodecs/tests/test_checksum32.py::test_crc32c_checksum in 53f2096a5e8ae3b3f9fcc8779b1d6525291e66ca. I'll mark this PR as ready for review for now. @joshmoore, could you please trigger the workflows for me again?
https://github.com/agriyakhetarpal/numcodecs/actions/runs/11746955955 via my fork is the first passing workflow run, with a few caveats:
- A (one)
crc32ctest is temporarily disabled (see above, will be available in the next Pyodide release) zfpyis not available yet (see above, will be available in the next Pyodide release)- Zarr v3 tests are temporarily disabled until we upgrade to Zarr v3 in Pyodide, hopefully by the next release
- I had to skip a doctest that was using threads in a messy manner, but it should work.
The rest of the skips/xfails are the usual WASM limitations where the multiprocessing/threading modules are not available, and we have a pretty significant portion of the test suite passing without issues. I've compiled crc32c and zfpy with Emscripten locally in the time being and triggered the test suite for numcodecs with them, which didn't reveal anything problematic. I'm open to including them in the CI job temporarily if needed – they don't take too long to compile.
Returning to this, I opened a PR to add pcodec to the Pyodide distribution, linked above. I hope we get it in a Pyodide 0.27.3 release sooner or later. I'm not aware of any new upcoming dependencies for numcodecs in the ambit, but we can get all of them added as long as they don't take too long to build.
No worries at all, @joshmoore, and I'm grateful for the review! This would need to wait slightly until we get pcodec available (maybe later this month), and I expect that things will pass here without many problems. The doctest running in WASM is a tricky problem indeed – one way could be to not run the tests from the docs/ directory, but as numcodecs doesn't follow the src/ format, we have no option but to modify the PYTHONPATH or similar so that it doesn't try to import from the wrong numcodecs/ directory.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.96%. Comparing base (
308cc25) to head (34f1fc2).
Additional details and impacted files
@@ Coverage Diff @@
## main #529 +/- ##
=======================================
Coverage 99.96% 99.96%
=======================================
Files 63 63
Lines 2712 2729 +17
=======================================
+ Hits 2711 2728 +17
Misses 1 1
| Files with missing lines | Coverage Δ | |
|---|---|---|
| numcodecs/tests/common.py | 100.00% <100.00%> (ø) |
|
| numcodecs/tests/test_blosc.py | 100.00% <100.00%> (ø) |
|
| numcodecs/tests/test_entrypoints_backport.py | 100.00% <100.00%> (ø) |
|
| numcodecs/tests/test_shuffle.py | 100.00% <100.00%> (ø) |
|
| numcodecs/tests/test_zarr3.py | 100.00% <100.00%> (ø) |
|
| numcodecs/zarr3.py | 99.40% <100.00%> (+0.01%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
pcodec has now been added to Pyodide via pyodide/pyodide#5432. I'll convert this PR to a draft one until we have it in a Pyodide release.
We've cut a 0.27.3 release today, which bundles pcodec; all tests pass on my fork: https://github.com/agriyakhetarpal/numcodecs/actions/runs/13545498292?pr=529
It looks like #569 is going to change a few things here – I have to note that Pyodide will need to set the provisioned NUMCODECS_USE_SYSTEM_LIBS variable to OFF, as we cross-compile from Linux to wasm32-emscripten and two patches are being applied to Blosc, for which we have to build it from the vendored sources.
I'm still happy to discuss with upstream maintainers if they're interested in such patches as I mentioned above in this PR, and I assume it helps both them and numcodecs that the patches aren't big at all.
(Relaunched)
Thanks, I fixed the drop in the coverage – it's just that Codecov doesn't know about this WASM environment setup, so it can't be aware that this code path is being tested elsewhere.
Re-running jobs after:
Exception: Request failed after too many retries. URL: https://ingest.codecov.io/upload/github/zarr-developers::::numcodecs/upload-coverage
[PYI-4825:ERROR] Failed to execute script 'main' due to unhandled exception!
I've addressed the merge conflicts introduced in the release notes file at the time of the 0.16 release. Thanks!
I wanted to note that there is a recent PR at #737, which also concerns the wasm32-unknown-emscripten target. The reason why the Pyodide CI job here can bypass that and work despite those changes is because we drop a curated list of unsupported arguments when cross-compiling. However, #737 will be a helpful change indeed.
Hi! I've resolved the new merge conflicts that had sprung up here. Also, I made sure that we're testing against the same Pyodide version as we're building the WASM wheels for. If you'd like me to, I can also try against the latest version of Pyodide to check for any regressions (which will use Emscripten 4.0.9 instead of 3.1.58) – cibuildwheel v3 soon will help doing this (i.e., testing against nightly or alpha versions of Pyodide), and a beta (3.0.0b1) is already out for testing.
Additionally, I enabled Zarr v3 as a dependency (it uses threads which won't work in WASM, so perhaps it makes sense to not add it or skip any new tests that will fail due to it). zfpy is also available in Pyodide 0.27.6 (which I've updated to, including a bump to Node 22), so that helps us not skip those tests. I had added a WASM build for pcodec and enabled it some time earlier.
Test suite passes at https://github.com/zarr-developers/numcodecs/actions/runs/15199151699/job/42749864404?pr=529.
The sister PR to accompany this PR, which adds Pyodide support for Zarr, is now also ready for review (marked above). Thank you!