accelerating v2 -> v3 migration
It's been several months since we released zarr-python 3 and there are still many active projects using zarr-python 2. For people deeply invested in the zarr-python 2 store API, migration to zarr-python 3 may not be easy, since the store API is very different. With this in mind, I think we should explore options for making migration from the zarr-python 2 APIs to the zarr-python 3 APIs easier.
A few ideas:
- a
v2namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this PR - wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3
MemoryStoreis a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke) - a rational approach to codecs. this is a longer conversation.
Any other ideas?
I'd add:
- A complete migration guide, listing exactly how to translate every part of the v2 API to the v3 API
- A tool to convert v2 data to v3 data in-place, without copying any data
Is this not already covered in the existing migration guide?
A wrapper class seems very useful, and might even expose some incompatibilities. You could also make a wrapper class and then raise deprecation warnings when it gets used.
a v2 namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this https://github.com/zarr-developers/zarr-python/pull/3075
This was discussed at length in the run up to the 3.0 release. Ultimately, we choose to remove the 2.18.X code from the release. With that in mind, I'm curious what has changed that would have us reverse this? I'm not entirely opposed to it but I'd like to think through the process a bit.
wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3 MemoryStore is a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke)
This is a good idea. It will not be easy to make async-friendly but it will "work"
a rational approach to codecs. this is a longer conversation.
From the perspective of Xarray users, getting the 3.0 dtypes and codecs into a stable state is the highest priority at this point.
There are a few v2 functionalities that still don't exist or work properly in v3 for full migration of our workflows (at least for us).
- Struct data type support https://github.com/zarr-developers/zarr-python/issues/2134
- FSSpec Caching doesn't work https://github.com/zarr-developers/zarr-python/issues/2988
- Synchronization primitives don't exist (we did parallel overlapping writes with Thread/Process locks in v2)
Just a comment from a user perspective, adding onto the previous comment.
We delay(ed) the move to v3 not because of migration difficulties, but due to a few missing features (e.g. copying stores), and due to pyodide being unable to deal with async code (yet).
Adding back some kind of least-recently-used cache would be very helpful too. (like LRUCache in v2)
Just going to link this here for reference: lastest tifffile supports zarr3 but performance is much worse for real images (large arrays, not-optimal chunks) than zarr2. This is discussed starting here: https://github.com/cgohlke/tifffile/issues/297#issuecomment-2905785157 Using this, over on the napari-tiff plugin side, we've updated to support zarr3, but I regret it a bit because performance has regressed so much on real whole-slide-images.
@psobolewskiPhD - would you mind opening a separate issue to discuss performance regressions? We'd love to understand the issue here but we've seen the opposite result in many zarr3 applications so we'll need to dig in to be helpful. Anything you can provide in terms of a reproducer would be great.