Error reading inlined reference data when trying to roundtrip virtual dataset
Hi,
I've been working on trying to use virtualizarr to create virtual datasets of GOES/Himawari/G2KA AWS data. The end goal is to generate virtual references appended along the t dimension for GOES data. One of the main issues I've encountered so far with using virtualizarr is after writing virtualizarr to disk, reading it back results in this error
NotImplementedError: Reading inlined reference data is currently not supported. [ToDo]
I can read it with xr.open_dataset but I want to eventually combine all the virtual references into one large virtual dataset, so need to be able to open the virtual datasets.
import xarray as xr
from virtualizarr import open_virtual_dataset
import s3fs
fs = s3fs.S3FileSystem(anon=True)
filepath = "s3://noaa-goes19/ABI-L1b-RadF/2025/001/00/OR_ABI-L1b-RadF-M6C08_G19_s20250010010205_e20250010019513_c20250010019570.nc"
vd = open_virtual_dataset(filepath, loadable_variables=["t",],reader_options={'storage_options': {"anon": True}})
print(vd)
vd.virtualize.to_kerchunk('g19.json', format='json')
d = xr.open_dataset("g19.json", engine="kerchunk", backend_kwargs={"storage_options": {"remote_options": {"anon": True}}})
print(d)
vd1 = open_virtual_dataset("g19.json", filetype="kerchunk")
print(vd1)
I get that its not supported yet, but either, how would I have virtualizarr not write inlined data, or is there any information/resources to point to where I could try adding support for it?
xarray version: 2025.1.2 s3fs version: 2025.2.0 virtualizarr version: 1.3.2
Also, not sure if this is related, but doing the above and trying to write to Parquet file also fails:
import xarray as xr
from virtualizarr import open_virtual_dataset
import s3fs
fs = s3fs.S3FileSystem(anon=True)
filepath = "s3://noaa-goes19/ABI-L1b-RadF/2025/001/00/OR_ABI-L1b-RadF-M6C08_G19_s20250010010205_e20250010019513_c20250010019570.nc"
vd = open_virtual_dataset(filepath, loadable_variables=["t",],reader_options={'storage_options': {"anon": True}})
print(vd)
vd.virtualize.to_kerchunk('g19.parquet', format='parquet')
with
KeyError: 'algorithm_dynamic_input_data_container/.zarray'
This is an important and annoying missing feature when serializing to kerchunk. Thanks for raising it explicitly.
However, if you use Icechunk as a serialization format instead you won't have this problem.
Okay, thank you. I did also try using icechunk for this same example, snippet here: https://github.com/zarr-developers/VirtualiZarr/issues/485#issuecomment-2725543238 but also with issues decoding the FillValue
I found a work around for the case where you only have inlined coordinates:
def open_virtual_with_inligned(uri):
kds = xr.open_dataset(
uri,
engine="kerchunk"
)
mds = open_virtual_dataset(
uri,
registry=registry,
parser=KerchunkParquetParser(skip_variables=kds.coords)
)
for k in kds.coords:
mds.coords[k]=kds[k]
return mds
the resulting mds can be stored with .vz.to_kerchunk again and it seems to create inlined bytes for the coords 😆