feat: Implement ZEP 8 URL syntax support for zarr-python
This PR implements support for the ZEP 8 URL syntax in Zarr Python.
Some examples of what now works:
import zarr
root = zarr.open_group('s3://bucket/data.zip|zip:|zarr3:') # S3 → ZIP → Zarr v3
arr = zarr.create_array('memory:|zarr2:group/array', shape=(10, ), dtype='i4') # Memory → Zarr v2
# custom adapter for icechunk
ds = xr.open_zarr('s3://icechunk-public-data/v1/glad|icechunk:') # icechunk (from xarray)
TODO:
- [x] Add unit tests and/or doctests in docstrings
- [x] Add docstrings and API docs for any new/modified user-facing classes and functions
- [x] New/modified features documented in
docs/user-guide/*.rst - [x] Changes documented as a new file in
changes/ - [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
closes #2943 fixes #2831 xref: https://github.com/zarr-developers/zeps/pull/48
cc @jbms
One tricky thing about s3+https://endpoint/a/b is that it is ambiguous as to whether it is using "virtual host" syntax (i.e. endpoint refers to a single bucket and the path is "a/b") or "path" syntax (i.e. the bucket is "a" and the path is "b").
The "path" syntax is generally the default when running a regular s3-compatible server, but the "virtual host" syntax can commonly occur when someone defines a CNAME DNS entry to map their own domain or subdomain to an AWS S3 bucket.
When designing this syntax for Neuroglancer, it seemed like it would be annoying to require users to use separate syntax to disambiguate the two cases. Instead, for operations where it matters (namely List), Neuroglancer just automatically determines which of the two cases applies by trying both ways and seeing which one succeeds, and then caching the result so that subsequent list operations don't require two requests.
Codecov Report
:x: Patch coverage is 67.57246% with 179 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 60.87%. Comparing base (ee9c182) to head (35526a5).
:warning: Report is 42 commits behind head on main.
:exclamation: There is a different number of reports uploaded between BASE (ee9c182) and HEAD (35526a5). Click for more details.
HEAD has 4 uploads less than BASE
Flag BASE (ee9c182) HEAD (35526a5) 14 10
Additional details and impacted files
@@ Coverage Diff @@
## main #3369 +/- ##
===========================================
- Coverage 94.92% 60.87% -34.06%
===========================================
Files 79 86 +7
Lines 9500 10231 +731
===========================================
- Hits 9018 6228 -2790
- Misses 482 4003 +3521
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/zarr/storage/_logging.py | 61.94% <ø> (-38.06%) |
:arrow_down: |
| src/zarr/storage/__init__.py | 9.52% <0.00%> (-85.48%) |
:arrow_down: |
| src/zarr/abc/__init__.py | 0.00% <0.00%> (ø) |
|
| src/zarr/api/asynchronous.py | 71.42% <0.00%> (-19.45%) |
:arrow_down: |
| src/zarr/storage/_common.py | 68.57% <85.00%> (-21.96%) |
:arrow_down: |
| src/zarr/storage/_register_adapters.py | 62.50% <62.50%> (ø) |
|
| src/zarr/registry.py | 63.19% <60.00%> (-25.63%) |
:arrow_down: |
| src/zarr/storage/_zip.py | 72.10% <78.94%> (-25.50%) |
:arrow_down: |
| src/zarr/abc/store_adapter.py | 39.02% <39.02%> (ø) |
|
| src/zarr/storage/_zep8.py | 83.95% <83.95%> (ø) |
|
| ... and 1 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Note: I have moved the URL pipeline proposal over from the ZEP repo to a separate repository:
https://github.com/jbms/url-pipeline