zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Doc/v3 migration guide

Open jhamman opened this issue 1 year ago • 8 comments

Over the past few weeks, we've had a number of conversations/questions about the policy for backward compatibility, deprecations, and breaking changes with the upcoming 3.0 release. This doc is meant to help us iterate toward common language. In its initial form, it is not complete.

Goals for the text here:

  1. Developers of 2.18 and 3.0 should be able to decide if backward compatibility is a required attribute of a contribution
  2. Users of Zarr should be able to understand if their application will be impacted by the upcoming 3.0 release
  3. Users of Zarr should be able to make a plan for how they will adapt their usage of Zarr after the release
  4. [non-goal] This is not meant to provide a comprehensive listing of the changes to the zarr API

cc @zarr-developers/python-core-devs

jhamman avatar Aug 19 '24 23:08 jhamman

One thing that might be helpful: what's the group's tolerance for either compatibility code or deprecations as a way to ease the transition? It sounds like strict backwards compatibility (perhaps with warnings) isn't a goal. Is there tolerance for things like https://github.com/zarr-developers/zarr-python/pull/2098 (e.g. restore some properties to the Group object, loosen the keyword-only requirement for some functions). Likewise for things like "cleaning up internal and user facing APIs", which could be done with a deprecation warning. Even if there isn't tolerance for backwards compatibility shims that really clash with the V3 spec or the current v3 implementation?

TomAugspurger avatar Aug 20 '24 13:08 TomAugspurger

I just came here following the link from #1849. Currently, this guide feels very incomplete, and since #2182 is now in-flight, I think this guide should become very clear.

From the perspective of a maintainer / close user of a bunch of libraries that depend on zarr but have extremely limited maintainer time, the most important thing I want to understand is how hard it will be to support both zarr 2.18 and zarr 3+ within a single library. I think this is the critical question for a smooth transition, because it is hard for libraries to all migrate at the same time, and you want libraries to be installable together in the same environment — you don't want someone depending on both napari and ome-zarr to face napari requiring zarr>=3 and ome-zarr requiring zarr<3. So many libraries would want to support the subset of zarr that is identical in v3 and v2 until everyone can agree to depend on 3+.

jni avatar Sep 16 '24 05:09 jni

@jni - thanks for the feedback. I agree this is not ready to ship yet. The main things that we know are changing is the Store API and access to internal APIs (e.g. zarr.core.xxx). Beyond that, the best way for us to fill out the migration guide is to have projects attempt to support v3 and report back. I'm doing that with Dask right now (https://github.com/dask/dask/pull/11388, https://github.com/zarr-developers/zarr-python/pull/2186) and I understand @TomAugspurger has begun the process for Xarray. So an ask for you and the Napari / ome-zarr devs is to try to do this and report back. Beyond that, specific suggestions to this doc are more than welcome (props to @dstansby for his edits already).

jhamman avatar Sep 17 '24 06:09 jhamman

If we want downstream packages to test with version 3.0.0a1, it might be good to do a blog post or add something to the docs explaining how to do that testing, what to look for, and how to provide feedback?

dstansby avatar Sep 17 '24 09:09 dstansby

Well, @Czaki started doing that for us in napari/napari#7215 and @d-v-b has been helping 🙏.

It looks like one of the remaining issues is that zarr.open defaults to v3 zarr (it'd be worth considering switching to calver, which I hate, if only to avoid the confusion between zarr format v3 and zarr-python v3... 😂), and tensorstore does not yet support v3 zarr files. At least that's my interpretation of these lines. Nor do I see any motions to change this in the tensorstore repo... @jbms?

But I think we can resolve this by explicitly writing a v2 zarr in the test?

specific suggestions to this doc are more than welcome

Something along the lines of:

Common functions have switched to keyword-only arguments, so you will need to change any invocation of, for example, zarr.open(path, 'a') to zarr.open(store=path, mode='a').

(An exhaustive list of such changes would be useful.)

jni avatar Sep 18 '24 11:09 jni

and tensorstore does not yet support v3 zarr files.

Tensorstore has supported zarr v3 for a long time: https://google.github.io/tensorstore/driver/zarr3/index.html

d-v-b avatar Sep 18 '24 11:09 d-v-b

Tensorstore has supported zarr v3 for a long time

oh, interesting, thanks for pointing that out @d-v-b! I couldn't actually find the relevant PR but did not look exhaustively. 🙏

jni avatar Sep 18 '24 11:09 jni

A few changes I've found while updating xarray. Are all of these intentional?

  • Array.resize returns a new Array object. 2.x mutated the Array in place
  • zarr_version has been renamed to zarr_format
  • Some exception types have changed (e.g. 2.x raised a zarr.errors.GroupNotFoundError while 3.x raises a ValueError)
  • write_empty_chunks has been removed

TomAugspurger avatar Oct 02 '24 12:10 TomAugspurger