zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

accelerating v2 -> v3 migration

Open d-v-b opened this issue 7 months ago • 8 comments

It's been several months since we released zarr-python 3 and there are still many active projects using zarr-python 2. For people deeply invested in the zarr-python 2 store API, migration to zarr-python 3 may not be easy, since the store API is very different. With this in mind, I think we should explore options for making migration from the zarr-python 2 APIs to the zarr-python 3 APIs easier.

A few ideas:

  • a v2 namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this PR
  • wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3 MemoryStore is a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke)
  • a rational approach to codecs. this is a longer conversation.

Any other ideas?

d-v-b avatar May 21 '25 10:05 d-v-b

I'd add:

  • A complete migration guide, listing exactly how to translate every part of the v2 API to the v3 API
  • A tool to convert v2 data to v3 data in-place, without copying any data

dstansby avatar May 21 '25 10:05 dstansby

Is this not already covered in the existing migration guide?

A wrapper class seems very useful, and might even expose some incompatibilities. You could also make a wrapper class and then raise deprecation warnings when it gets used.

TomNicholas avatar May 21 '25 12:05 TomNicholas

a v2 namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this https://github.com/zarr-developers/zarr-python/pull/3075

This was discussed at length in the run up to the 3.0 release. Ultimately, we choose to remove the 2.18.X code from the release. With that in mind, I'm curious what has changed that would have us reverse this? I'm not entirely opposed to it but I'd like to think through the process a bit.

wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3 MemoryStore is a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke)

This is a good idea. It will not be easy to make async-friendly but it will "work"

a rational approach to codecs. this is a longer conversation.

From the perspective of Xarray users, getting the 3.0 dtypes and codecs into a stable state is the highest priority at this point.

jhamman avatar May 21 '25 13:05 jhamman

There are a few v2 functionalities that still don't exist or work properly in v3 for full migration of our workflows (at least for us).

  1. Struct data type support https://github.com/zarr-developers/zarr-python/issues/2134
  2. FSSpec Caching doesn't work https://github.com/zarr-developers/zarr-python/issues/2988
  3. Synchronization primitives don't exist (we did parallel overlapping writes with Thread/Process locks in v2)

tasansal avatar May 21 '25 14:05 tasansal

Just a comment from a user perspective, adding onto the previous comment.

We delay(ed) the move to v3 not because of migration difficulties, but due to a few missing features (e.g. copying stores), and due to pyodide being unable to deal with async code (yet).

FabricioArendTorres avatar May 22 '25 12:05 FabricioArendTorres

Adding back some kind of least-recently-used cache would be very helpful too. (like LRUCache in v2)

dstansby avatar May 30 '25 10:05 dstansby

Just going to link this here for reference: lastest tifffile supports zarr3 but performance is much worse for real images (large arrays, not-optimal chunks) than zarr2. This is discussed starting here: https://github.com/cgohlke/tifffile/issues/297#issuecomment-2905785157 Using this, over on the napari-tiff plugin side, we've updated to support zarr3, but I regret it a bit because performance has regressed so much on real whole-slide-images.

psobolewskiPhD avatar May 30 '25 16:05 psobolewskiPhD

@psobolewskiPhD - would you mind opening a separate issue to discuss performance regressions? We'd love to understand the issue here but we've seen the opposite result in many zarr3 applications so we'll need to dig in to be helpful. Anything you can provide in terms of a reproducer would be great.

jhamman avatar May 30 '25 17:05 jhamman