Tom Nicholas
Tom Nicholas
Currently the `ChunkManifest` is hardcoded to use numpy arrays underneath to store the paths/offsets/byte ranges. However there are a few cases where we might want to use another format: -...
Compare Kerchunk JSON/Parquet/Icechunk
- [ ] Closes #xxxx - [ ] Tests added - [ ] Tests passing - [ ] Full type hint coverage - [ ] Changes are documented in `docs/releases.rst`...
It would be neat to add a method `vds.nrefs()` to the accessor, which returns the total number of virtual references in the dataset. It's useful to know how many millions...
Icechunk now has a `validate_containers` kwarg to `.set_virtual_ref`/`.set_virtual_refs`. We should publicly expose this in `.virtualize.to_icechunk()`. I think we want to make `True` the default and expose it for overriding. The...
We have code to automatically interpret a URI as being a local file, S3-compatible store, or HTTPStore using Obstore. https://github.com/zarr-developers/VirtualiZarr/blob/0bf830e63bc5bdedd87cd824d064071600d5b44a/virtualizarr/manifests/store.py#L142 But it feels a bit fragile and not really core...
Encountered while working on #557 ```python vz.open_virtual_dataset( 's3://cworthy/oae-efficiency-atlas/data/experiments/000/01/alk-forcing.000-1999-01.pop.h.0347-01.nc', loadable_variables=[], decode_times=False, reader_options={'storage_options': {'anon': True, 'endpoint_url': 'https://data.source.coop/'}}, ) ``` ```python --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File :1 File ~/Documents/Work/Code/VirtualiZarr/virtualizarr/backend.py:351, in...
Once we've completed https://github.com/zarr-developers/VirtualiZarr/issues/473, the responsibility of each reader will essentially be reduced to just creating a fully virtual `ManifestStore` instance from the given filepath. This means that we could...
I realised that the in-progess `ManifestStore` refactor would actually allow us to separate concerns so much that we could potentially make xarray an optional dependency, where you only need xarray...
Should we add a top-level function `open_virtual_mfdataset`? Though I would like to be more confident about the best way to parallelize reference generation first https://github.com/zarr-developers/VirtualiZarr/issues/123 https://github.com/zarr-developers/VirtualiZarr/issues/7