zarr-python
zarr-python copied to clipboard
Doc/v3 migration guide
Over the past few weeks, we've had a number of conversations/questions about the policy for backward compatibility, deprecations, and breaking changes with the upcoming 3.0 release. This doc is meant to help us iterate toward common language. In its initial form, it is not complete.
Goals for the text here:
- Developers of 2.18 and 3.0 should be able to decide if backward compatibility is a required attribute of a contribution
- Users of Zarr should be able to understand if their application will be impacted by the upcoming 3.0 release
- Users of Zarr should be able to make a plan for how they will adapt their usage of Zarr after the release
- [non-goal] This is not meant to provide a comprehensive listing of the changes to the zarr API
cc @zarr-developers/python-core-devs
One thing that might be helpful: what's the group's tolerance for either compatibility code or deprecations as a way to ease the transition? It sounds like strict backwards compatibility (perhaps with warnings) isn't a goal. Is there tolerance for things like https://github.com/zarr-developers/zarr-python/pull/2098 (e.g. restore some properties to the Group object, loosen the keyword-only requirement for some functions). Likewise for things like "cleaning up internal and user facing APIs", which could be done with a deprecation warning. Even if there isn't tolerance for backwards compatibility shims that really clash with the V3 spec or the current v3 implementation?
I just came here following the link from #1849. Currently, this guide feels very incomplete, and since #2182 is now in-flight, I think this guide should become very clear.
From the perspective of a maintainer / close user of a bunch of libraries that depend on zarr but have extremely limited maintainer time, the most important thing I want to understand is how hard it will be to support both zarr 2.18 and zarr 3+ within a single library. I think this is the critical question for a smooth transition, because it is hard for libraries to all migrate at the same time, and you want libraries to be installable together in the same environment — you don't want someone depending on both napari and ome-zarr to face napari requiring zarr>=3 and ome-zarr requiring zarr<3. So many libraries would want to support the subset of zarr that is identical in v3 and v2 until everyone can agree to depend on 3+.
@jni - thanks for the feedback. I agree this is not ready to ship yet. The main things that we know are changing is the Store API and access to internal APIs (e.g. zarr.core.xxx). Beyond that, the best way for us to fill out the migration guide is to have projects attempt to support v3 and report back. I'm doing that with Dask right now (https://github.com/dask/dask/pull/11388, https://github.com/zarr-developers/zarr-python/pull/2186) and I understand @TomAugspurger has begun the process for Xarray. So an ask for you and the Napari / ome-zarr devs is to try to do this and report back. Beyond that, specific suggestions to this doc are more than welcome (props to @dstansby for his edits already).
If we want downstream packages to test with version 3.0.0a1, it might be good to do a blog post or add something to the docs explaining how to do that testing, what to look for, and how to provide feedback?
Well, @Czaki started doing that for us in napari/napari#7215 and @d-v-b has been helping 🙏.
It looks like one of the remaining issues is that zarr.open defaults to v3 zarr (it'd be worth considering switching to calver, which I hate, if only to avoid the confusion between zarr format v3 and zarr-python v3... 😂), and tensorstore does not yet support v3 zarr files. At least that's my interpretation of these lines. Nor do I see any motions to change this in the tensorstore repo... @jbms?
But I think we can resolve this by explicitly writing a v2 zarr in the test?
specific suggestions to this doc are more than welcome
Something along the lines of:
Common functions have switched to keyword-only arguments, so you will need to change any invocation of, for example,
zarr.open(path, 'a')tozarr.open(store=path, mode='a').
(An exhaustive list of such changes would be useful.)
and tensorstore does not yet support v3 zarr files.
Tensorstore has supported zarr v3 for a long time: https://google.github.io/tensorstore/driver/zarr3/index.html
Tensorstore has supported zarr v3 for a long time
oh, interesting, thanks for pointing that out @d-v-b! I couldn't actually find the relevant PR but did not look exhaustively. 🙏
A few changes I've found while updating xarray. Are all of these intentional?
Array.resizereturns a newArrayobject. 2.x mutated theArrayin placezarr_versionhas been renamed tozarr_format- Some exception types have changed (e.g. 2.x raised a
zarr.errors.GroupNotFoundErrorwhile 3.x raises aValueError) write_empty_chunkshas been removed