zarr-python
zarr-python copied to clipboard
Zstd Codec on the GPU
This PR adds a Zstd codec that runs on the GPU using the nvCOMP 4.2 python APIs.
TODO:
- [ ] Make fully async
- [ ] Performance benchmarking
- [ ] CPU-GPU roundtrip testing
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in
docs/user-guide/*.rst - [ ] Changes documented as a new file in
changes/ - [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
Thanks for opening this PR! At the moment we do not have any codecs implemented in the zarr-python package, but instead store them in numcodecs. So although it looks like there are some zarr-python specific changes that are needed to support the new GPU codec, the actual codec should be implemented in numcodecs, and then imported in zarr-python.
@dstansby My understanding was that numcodecs is the place that python bindings to native codec implementations live and that with v3, the Codec class itself lives in zarr-python. The GPU codecs and python bindings are implemented in nvCOMP and imported through the nvidia-nvcomp-cu12 python package so I'm not sure which part of this would need to go in numcodecs. What did you have in mind?
Fixed some merge conflicts and changed the _convert_{to,from}_nvcomp_arrays methods to be sync.
Codecov Report
:x: Patch coverage is 58.02469% with 34 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 61.24%. Comparing base (e1990c0) to head (ac14838).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/zarr/codecs/gpu.py | 54.92% | 32 Missing :warning: |
| src/zarr/codecs/__init__.py | 0.00% | 1 Missing :warning: |
| src/zarr/core/buffer/gpu.py | 87.50% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2863 +/- ##
==========================================
- Coverage 61.25% 61.24% -0.01%
==========================================
Files 84 85 +1
Lines 9949 10017 +68
==========================================
+ Hits 6094 6135 +41
- Misses 3855 3882 +27
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/zarr/core/array.py | 68.69% <100.00%> (+0.07%) |
:arrow_up: |
| src/zarr/core/config.py | 29.16% <ø> (ø) |
|
| src/zarr/codecs/__init__.py | 0.00% <0.00%> (ø) |
|
| src/zarr/core/buffer/gpu.py | 45.45% <87.50%> (+4.20%) |
:arrow_up: |
| src/zarr/codecs/gpu.py | 54.92% <54.92%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@d-v-b f16d730b9b5b13bcfec934cd26ccee0cd8a6e466 fixes an issue after https://github.com/zarr-developers/zarr-python/pull/3228 I think. Where providing an alternative zstd implementation here, so when we figure out which codec to use when creating an array, we need to look up the codec class registered for zstd instead of just using zarr.codecs.ZstdCodec(). LMK if that sounds right to you.
@d-v-b f16d730 fixes an issue after #3228 I think. Where providing an alternative zstd implementation here, so when we figure out which codec to use when creating an array, we need to look up the codec class registered for
zstdinstead of just usingzarr.codecs.ZstdCodec(). LMK if that sounds right to you.
This definitely seems like a good approach given the current state of our codec registry. More broadly, I'm not convinced that we have an ergonomic way of supporting multiple implementations of the same codec...
Akshay responded on this point in February(!) @dstansby could you at least respond to that directly: that this is a wrapper around a codec engine, in exactly the same way that https://github.com/zarr-developers/zarr-python/blob/abbdbf2be70a24e7d662b5ed449c68f6718977f9/src/zarr/codecs/zstd.py is?
Finally, this should also get some documentation in the user guide about how to swap between using the CPU and GPU when both are an option.
That's already included in gpu.rst.
where is the source code for the nvcomp zstd implementation, and the python bindings?
where is the source code for the nvcomp zstd implementation, and the python bindings?
I don't believe that the source code for those are published publicly.
where is the source code for the nvcomp zstd implementation, and the python bindings?
I don't believe that the source code for those are published publicly.
That's potentially quite problematic. We recently had problems relating to skew across zstd implementations. If we cannot inspect the source code for this codec, and we cannot submit patches, then I would definitely not be interested in experiencing bugs from it.
If we cannot inspect the source code for this codec, and we cannot submit patches, then I would definitely not be interested in experiencing bugs from it.
Is that you speaking as a user of zarr or a maintainer?
From the user perspective, this will only be used if you activate it.
From the maintainer's perspective, I'd hope that through documentation and clear error messages we can pinpoint the issue for users.
Akshay responded on this point https://github.com/zarr-developers/zarr-python/pull/2863#issuecomment-2691011494(!) @dstansby could you at least respond to that directly: that this is a wrapper around a codec engine, in exactly the same way that abbdbf2/src/zarr/codecs/zstd.py is?
Sorry, I missed that. And sorry for the slow reply - I am not paid to work on zarr-python, and review stuff on a best efforts basis.
re. where stuff lives, I did not realise that we had codec classes implemented directly in zarr-python. Indeed, our migration guide says (!!)
zarr.codecs has gone, use numcodecs instead
Given that codecs classes are in zarr-python already, and given the direction of conversation in https://github.com/zarr-developers/zarr-python/issues/3272, I'm happy to put this in zarr-python.
That's already included in gpu.rst.
Am I right in thinking that GPU arrays and now decompression are a global option, not a per-array or a per-operation configuration? If so it would be good to clarify in gpu.rst that a) this is only a global option b) if it can be changed during a python session, and c) how decompression of GPU buffers is handled for codecs without a GPU implementation (I presume it just falls back to the CPU impelementation)
re. @d-v-b's point about about skew across zstd implementations, I think this is a very important point. I'm personally on the fence about depending on a closed source library, but as a minimum I think there should be tests that compare the compression/decompression with the (open source I presume?) CPU implementation and the new GPU implementation, and make sure the results are the same.
Is that you speaking as a user of zarr or a maintainer?
Both? Zarr users and maintainers routinely expose bugs in libraries we depend on, and we routinely forward issues to those libraries, and occasionally contribute patches. If users report a bug or problem in nvcomp, what do we do? Suppose you or akshay leave nvidia, should I email [email protected] when we have problems?
Sorry, I missed that. And sorry for the slow reply - I am not paid to work on zarr-python, and review stuff on a best efforts basis.
Likewise :/
Am I right in thinking that GPU arrays and now decompression are a global option, not a per-array or a per-operation configuration?
I'm not sure... In AsyncArray.getitem there is a prototype argument that seems to be used for the buffers. I'm not sure whether the intent is for that to control all intermediate buffers or just the user-facing output buffer. The whole codec pipeline is still a bit of a black box to me.
I think that https://zarr.readthedocs.io/en/stable/user-guide/config.html is the relevant documentation here:
For selecting custom implementations of codecs, pipelines, buffers and ndbuffers, first register the implementations in the registry and then select them in the config.
This isn't adding any new concepts: just a new implementation that uses the existing configuration system (which I think is fair to summarize as a global managed through our configuration system).
if it can be changed during a python session,
Sure... It's the same as any other configuration option. We don't explicitly show using any option in a context manager, but we do link to https://github.com/pytroll/donfig, which documents that behavior. I can duplicate that in our documentation if you want.
and c) how decompression of GPU buffers is handled for codecs without a GPU implementation (I presume it just falls back to the CPU implementation)
Yeah, it's following the documented behavior: the codec class associated with a given codec ID is used. We can repeat that in the GPU documentation and cross-reference the configuration docs.
So from a generic codec side of things, it'll just use whatever codec implementation is associated with that codec name. From a GPU-specific side of things, we'll want to document which codecs are currently implemented (I imagine the remaining codecs will be a relatively straightforward refactor based on top of what this branch implements).
If users report a bug or problem in nvcomp, what do we do?
An issue filed at https://github.com/NVIDIA/CUDALibrarySamples would be best (I personally would ask the reporter to submit that rather than filing on their behalf, but you're probably more patient than me 😄).
I think there should be tests that compare the compression/decompression
I'll add those to this PR.
I think there should be tests that compare the compression/decompression
https://github.com/zarr-developers/zarr-python/pull/2863/files#diff-b4b3e1bcdd519d839ba90f79f062ef0147362dbac854af26f871b53796791f21R69 has that.
I believe that all the comments have been addressed now but let me know if I missed anything.
I'm unsure why codecov is reporting that the coverage dropped here. Locally I see 100% coverage for codec/gpu.py and the lines changed in core/buffer/gpu.py changed in this PR are covered.
I confess that I'm worried about retracing the steps that led to a large number of poorly-maintained storage backends in zarr-python 2. I'm not saying that this functionality will be poorly maintained, but it is a risk, and I think we need to make a decision as a project about how we weigh that risk. For this particular feature, what would it look like if we spun it off into a separate repo under the zarr-developers org?
For this particular feature, what would it look like if we spun it off into a separate repo under the zarr-developers org?
Let's see:
For users, I think it could in principle be mostly seamless. We have entry points for loading a codec class without requiring users to import it. If this were in a separate package, I'd push for the gpu extra to depend on it, so that getting zarr-python with GPU support is still just pip install zarr[gpu] (this does induce a circular dependency, but it's in an extra so maybe that's not so bad).
We'd need some system to ensure that zarr.config.enable_gpu() has the ability to ask 3rd-party packages what changes to make. I don't think that system currently exists and so would need to be designed and implemented.
For maintainers, it'd be additional overhead from a separate project (CI, packaging, docs), which is minor but non-negligible.
My main question: is zarr-python really willing to commit to this as a stable interface, to the degree that third parties can build on it without worrying about things breaking? I worry that we don't know enough about the codec interface and its interaction with buffers yet to really commit to that (e.g. all your suggestions about buffers earlier in the thread).
My main question: is zarr-python really willing to commit to this as a stable interface, to the degree that third parties can build on it without worrying about things breaking? I worry that we don't know enough about the codec interface and its interaction with buffers yet to really commit to that (e.g. all your suggestions about buffers earlier in the thread).
I am personally not convinced that our codec API is as good as it could be, and I think the process of developing new codecs is a really important part of finding out how we can improve it. In particular, I'm worried that we are missing out on really simple performance optimizations like making the basic encoding and decoding routines take an optional buffer. But this is a separate question from whether we have the will to change this API any time soon.
If we optimistically assume that the codec API might evolve in the future, then it does makes things easiest if the functionality in this PR is defined inside zarr python. So that's a solid point in favor of keeping it in this repo.
As a counter-point, the other day I was doing development on my mac laptop and I made changes that caused failures in the GPU tests. My mac doesn't have an nvidia gpu, so I couldn't run any of the cuda-specific tests, and so I had to rely on github actions to debug. That's a pretty bad development experience, so I do worry that this will only get worse if we add more cuda-specific code that needs to be updated as part of codebase-wide refactoring efforts, but can only be developed on a system with an nvidia gpu. The development friction this adds might actually cancel out the friction reduced by keeping the gpu-zstd code close to the codec definitions. So in the end I'm really not sure where the cuda zstd codec should live.
I don't want this effort to be stalled, and I think the functionality here is really important. It would be great to get input on this from other @zarr-developers/python-core-devs
Most OSS devs I know don't have access to a NVIDIA GPU on their machine, so having NVIDIA-specific code in here does greatly increase the barrier to development (basically, I'm echoing the above comment).
GPU development also requires provision of a non-standard GitHub actions runner, that I might be right in thinking we're paying for somehow? There's a risk that this goes away in the future.
So I'd be +1 to moving code designed for non-CPU hardware outside of zarr-python, perhaps into zarr-python-gpu or similar? The other advantage is you could move faster and break stuff (and not rely on zarr-python core dev reviews), compared to developing in zarr-python that has to stay very stable for it's wide user base.
I'm not that deep in the existing GPU code, so practically I'm not sure how feasible moving it out would be. But at least for codecs, it seems like it should definitely be possible to implement custom codecs in other packages?
We talked about this a bit at last week's dev call. I think there's general agreement that native support for GPUs is important. And that at a technical level developing GPU-specific bits here or in some other repo is feasible.
For better or worse, we've defined public interfaces for many parts of Zarr-python. Having gpu.{Buffer,NDBuffer} in this repo and running on CI makes it clear when a change breaks something that the gpu implementation was relying on. But actually debugging and fixing those issues is extremely challenging without access to the right hardware.
We can split that out to another repo that contains the (NVIDIA) GPU specific bits (and likewise for MLX and other accelerators that want their own memory types, codecs, store, etc). But I do think it's important that the zarr-python project commits to supporting an interface. That involves a higher commitment to stability, and might mean some kind of (informal) policy on reverting commits that happen to break downstream libraries, assuming the issue is noticed and reported quickly. If we pair a strong commitment to stability from zarr-python core, along with "champions" from any of the downstream projects that commit to reporting issues and helping debug problems at the boundary, then I think splitting this out to a third-party repo and building on the extension points should work.
we talked about this PR (and non-cpu-device-dependent extensions to zarr python in general) at the recent zarr-python dev meeting.
While asymptotically I think we would be happy seeing a stand-alone zarr-developers/zarr-zstd repo containing CUDA (and CPU!) implementations of the Zstd codec, it's not clear that we can ensure stability for the codec, buffer, store APIs etc needed for that to be a smooth developer experience. That points to putting this functionality in Zarr Python. The prospect of exposing this under the potential zarr.experimental module seemed like a good balance between getting it in user's hands while also giving us flexibility down the road. That path relies on getting #3490 merged.
Thanks for the update. I can get the merge conflicts and any remaining issues ironed out this week.