zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Sharding storage transformer for v3

Open jstriebel opened this issue 3 years ago • 13 comments

Technical Notes This PR is based on and includes changes from #1096. Therefore code review should probably done on the diff PR https://github.com/scalableminds/zarr-python/pull/2. Any suggestions about the spec or the sharding mechanism in general should probably be placed at https://github.com/zarr-developers/zarr-specs/pull/152.

Description

This PR

  • adds a sharding storage transformer:
    • based on the current spec proposal from ZEP2
    • supports partial reads
    • writes are limited so far: partial writes are not efficient yet, shards are always completely rewritten (a more efficient implementation might be a good follow-up)
  • adapts the v3 fsspec store to allow partial reads
  • uses partial reads for uncompressed v3 arrays (some refactoring might be needed in core.py to simplify the code-paths, but I'd rather defer this to a separate PR to keep the diff readable)
  • added test cases
  • resolves parts of #1106

TODO:

  • [x] Add unit tests and/or doctests in docstrings
  • [x] Add docstrings and API docs for any new/modified user-facing classes and functions
  • ~~New/modified features documented in docs/tutorial.rst~~ (v3 is not yet documented)
  • [x] Changes documented in docs/release.rst
  • [x] GitHub Actions have all passed
  • [x] Test coverage is 100% (Codecov passes)
  • [ ] Consider adding a short-cut for sharded array creation (e.g. having a chunks_per_shard argument directly)

jstriebel avatar Aug 22 '22 11:08 jstriebel

Codecov Report

Merging #1111 (a2eb332) into main (6f11ae7) will not change coverage. The diff coverage is 100.00%.

:exclamation: Current head a2eb332 differs from pull request most recent head dbf9fff. Consider uploading reports for the commit dbf9fff to get more accurate results

@@            Coverage Diff             @@
##              main     #1111    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           35        36     +1     
  Lines        14410     14739   +329     
==========================================
+ Hits         14410     14739   +329     
Impacted Files Coverage Δ
zarr/_storage/v3.py 100.00% <100.00%> (ø)
zarr/_storage/v3_storage_transformers.py 100.00% <100.00%> (ø)
zarr/core.py 100.00% <100.00%> (ø)
zarr/meta.py 100.00% <100.00%> (ø)
zarr/tests/test_core.py 100.00% <100.00%> (ø)
zarr/tests/test_storage_v3.py 100.00% <100.00%> (ø)
zarr/util.py 100.00% <100.00%> (ø)
zarr/storage.py 100.00% <0.00%> (ø)
zarr/tests/test_storage.py 100.00% <0.00%> (ø)

codecov[bot] avatar Aug 22 '22 12:08 codecov[bot]

This pull request introduces 2 alerts when merging 06ce675073e2e86688e1d1aea126644a82ab4791 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes
  • 1 for Use of the return value of a procedure

lgtm-com[bot] avatar Aug 22 '22 12:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 4c0807eff4662bcdb3895709eb276528eb13eba5 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes
  • 1 for Use of the return value of a procedure

lgtm-com[bot] avatar Aug 22 '22 13:08 lgtm-com[bot]

This pull request introduces 2 alerts when merging 61db74a1feab92f4648480fee7c78e5dd1dafc29 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes
  • 1 for Use of the return value of a procedure

lgtm-com[bot] avatar Aug 22 '22 15:08 lgtm-com[bot]

This pull request introduces 1 alert when merging fde61e86e59d8743ec7b6b90131f3cd4d58b5b42 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 22 '22 16:08 lgtm-com[bot]

This pull request introduces 1 alert when merging de4de18c31f21b34bbf2d6f97f27d2519a1b48f6 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 23 '22 11:08 lgtm-com[bot]

This pull request introduces 1 alert when merging 0deb2b63935b4fe4bc8138ff9817ea204474bfa4 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 23 '22 12:08 lgtm-com[bot]

This pull request introduces 1 alert when merging d3eda71a9bbc201f58169101e6107c37d9e2627d into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 23 '22 15:08 lgtm-com[bot]

This pull request introduces 1 alert when merging 6e2790c6b5596471bbb47ccdcad8d9f035942f58 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 24 '22 13:08 lgtm-com[bot]

This pull request introduces 1 alert when merging e7b14b7ec07ab56c990365bee639add8f80c80b4 into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 25 '22 12:08 lgtm-com[bot]

This pull request introduces 1 alert when merging a52300c330af8281629ec902ec3e56546723ce1f into 44de0e4a017b8919bb5caba41eadcd67e18abdb9 - view on LGTM.com

new alerts:

  • 1 for `__eq__` not overridden when adding attributes

lgtm-com[bot] avatar Aug 25 '22 13:08 lgtm-com[bot]

@joshmoore & @grlee77 I think this PR is now ready for a round of review, would you have time for it? As described above, it builds upon #1096 and adds a sharding transformer implementation, together with some optimizations with partial reads. Reviewing this is probably best done on the diff PR https://github.com/scalableminds/zarr-python/pull/2.

I left one TODO open, what do you think about it?

Consider adding a short-cut for sharded array creation (e.g. having a chunks_per_shard argument directly)

jstriebel avatar Aug 25 '22 14:08 jstriebel

@martindurant Thanks a lot for the review! Sorry I'm answering so late, I was quite busy the last weeks, and will be on vacation the next two weeks, so I'll just get back to this PR afterwards. Please feel free to adapt anything also directly, e.g. having a PR against this one, since you are much more familiar with the internals of fsspec and have a great overview of the zarr code-base :+1:

jstriebel avatar Sep 08 '22 14:09 jstriebel

@joshmoore I just added a ZARR_V3_SHARDING flag for sharding usage, as discussed previously.

jstriebel avatar Dec 22 '22 13:12 jstriebel

@jstriebel : ok to leave you to push the update here? (If not, I'll dig in)

joshmoore avatar Jan 16 '23 16:01 joshmoore

@jstriebel : ok to leave you to push the update here? (If not, I'll dig in)

Sure, done :+1:

jstriebel avatar Jan 19 '23 13:01 jstriebel

Getting this merged

:rocket:

I'll prep a 2.14.0 (including ensure_bytes) unless there are any concrete versioning suggestions.

Seems fine to me :+1:

jstriebel avatar Jan 23 '23 15:01 jstriebel

@joshmoore I don't see any further actionables for this PR, I'd be glad if we could merge it soon.

jstriebel avatar Feb 02 '23 14:02 jstriebel

Yessirs.

joshmoore avatar Feb 02 '23 14:02 joshmoore

This is huge! 🙌

rabernat avatar Feb 02 '23 14:02 rabernat

:tada:

jstriebel avatar Feb 02 '23 14:02 jstriebel