VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Re-implement `loadable_variables` using ManifestStore

Open TomNicholas opened this issue 10 months ago • 0 comments

https://github.com/zarr-developers/VirtualiZarr/issues/124 describes @ayushnag's idea to allow loading data variables directly from the in-memory ManifestArrays, rather than having to write to kerchunk/icechunk then reading from that. PR #458 will add a zarr-compliant in-memory virtual ManifestStore that wraps a virtual dataset and would allow loading data from it via

virtual_ds = vz.open_virtual_dataset(filepath)
manifeststore = vz.ManifestStore(virtual_ds)
lazy_ds = xr.open_zarr(manifeststore)
loaded_ds = ds.load()

This issue is to track the idea that once #458 is implemented we should refactor the implementation of loadable_variables to use this ManifestStore + xr.open_zarr approach internally, for all backends. Currently this is instead done by each virtual backend calling out to a different xarray backend, depending on the filetype.

There are multiple reasons to re-implement this:

  1. We would no longer need every virtual backend to have a corresponding xarray backend,
  2. We would be able to guarantee (and create property-based tests - see #394) that loading data via loadable_variables will give the same result as creating a virtual dataset, writing to icechunk, then loading,
  3. Make it easier entralize file handle management, so we can close file handles in the way xarray can (see https://github.com/zarr-developers/VirtualiZarr/issues/468).

FYI @chuckwondo

TomNicholas avatar Mar 04 '25 16:03 TomNicholas