zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

feat: Implement ZEP 8 URL syntax support for zarr-python

Open ianhi opened this issue 2 months ago • 1 comments

Continuation of @jhamman's #3369 with addressing my review comments Claude summary of the changes on top of Joe's PR:

  1. Format Propagation ✅
  • Zarr format segments (zarr2:, zarr3:) now propagate from URLs to array/group creation
  • Modified resolve_url_with_path() to return (Store, str, ZarrFormat | None)
  • Updated all API functions (create, open, open_array, open_group) to use URL format when user doesn't specify
  • 15 new tests for format propagation with comprehensive on-disk verification
  1. Nested Adapter Chains ✅
  • Implemented recursive URL resolution in ZipAdapter to support nested adapters
  • Added _create_nested_zip_store() method that handles nested ZEP 8 URLs
  • Enables ZEP 8 spec examples like file:outer.zip|zip:inner.zip|zip:data.zarr
  • 8 comprehensive tests covering arrays, groups, edge cases, and complex hierarchies
  1. Exception Handling Cleanup ✅
  • Removed all broad except Exception: handlers from _zep8.py
  • Simplified control flow by removing redundant try-except blocks
  • Only specific KeyError exceptions caught where appropriate
  • Parser exceptions now propagate naturally for easier debugging
  1. URL Logic Refactoring
  • Fixed s3+https scheme handling
  • Improved URL validation logic
  • Better separation of concerns between parsing and resolution
  1. Storage Options Validation
  • Enhanced storage_options handling and validation
  • Better error messages for invalid configurations

Test Coverage: 167 tests passing (1 skipped), up from ~150 tests


Missing ZEP 8 Features Analysis

Based on the ZEP 8 specification, here are the adapter schemes defined vs. implemented:

✅ Currently Implemented (9 adapters)

  • file: - FileSystemAdapter
  • memory: - MemoryAdapter
  • https:, http: - RemoteAdapter
  • s3:, s3+http:, s3+https: - S3Adapter (via RemoteAdapter)
  • gs: - GCSAdapter (via RemoteAdapter)
  • zip: - ZipAdapter
  • log: - LoggingAdapter (custom, not in spec)
  • zarr2:, zarr3:, zarr: - Format segments (handled by resolution layer)

❌ Missing from Spec (11 adapters)

Storage/Database Adapters:

  1. ocdbt: - OCDBT format (versioned KV store)
  2. icechunk: - Icechunk format (versioned Zarr store)

Compression Adapters: 3. gzip: - Transparent gzip decompression 4. zstd: - Transparent zstd decompression

Data Format Adapters: 5. n5: - N5 format support 6. tiff:, jpeg:, png:, bmp:, avif:, webp: - Image format adapters 7. neuroglancer-precomputed: - Neuroglancer format 8. json: - JSON pointer access

Utility Adapters: 9. byte-range:start-end - Byte range extraction 10. ..: - Parent directory traversal (for relative URLs)

Other Missing Features:

  • Relative URL pipeline syntax - Spec lines 489-543 (explicitly noted as not supported in zarr-python implementation notes)
  • Format auto-detection - Spec lines 420-443 (MAY support, optional feature)

ianhi avatar Nov 04 '25 18:11 ianhi

Note: I have moved the URL pipeline proposal over from the ZEP repo to a separate repository:

https://github.com/jbms/url-pipeline

jbms avatar Nov 25 '25 01:11 jbms