VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

datatree backend for opening grib files

Open TomNicholas opened this issue 1 year ago • 3 comments

Recently a way of kerchunking grib data as a DataTree object was added https://github.com/fsspec/kerchunk/pull/399. Since the ongoing xarray-datatree integration is adding an open_datatree method to xarray's backendentrypoint classes, it's likely that we could make a open_datatree method that understands how to read a grib file and return a datatree containing ManifestArray objects.

TomNicholas avatar Mar 08 '24 23:03 TomNicholas

We actually don't need to wait for anything upstream in xarray to occur before making something useful here. We could simply create a new virtualizarr.open_virtual_datatree function, which would detect the filetype, loop over the groups, and use open_virtual_datatree(/kerchunk directly if necessary) to first create the virtual xr.Dataset objects, then put them all into a datatree.Datatree to return. This function could be modelled after how datatree.open_datatree currently works.

At that point you would have a datatree.Datatree object wrapping lots of ManifestArray objects (let's call it vdt1 for "virtual datatree 1"). You could concatenate two such trees using

from datatree import map_over_subtree

combined_virtual_tree = datatree.map_over_subtree(xr.concat, vdt1, vdt2, dim=
'time')

(cc @maxrjones, who asked about doing something similar but for nested HDF5 files)

TomNicholas avatar Mar 27 '24 20:03 TomNicholas

@TomNicholas does this mean we can use VirtualiZarr with GRIB files already, or do we need to wait for #312 ?

cr458 avatar Jul 24 '25 20:07 cr458

VirtualiZarr does not ship with support for GRIB yet. Notice there is no GRIBParser of any kind in the virtualizarr.parsers namespace. At least one implementation is being actively worked on though: https://github.com/virtual-zarr/hrrr-parser

TomNicholas avatar Jul 24 '25 21:07 TomNicholas