Tom Nicholas issues

Results 303 issues of


                                            Tom Nicholas

Make Manifests' in-memory reference structure pluggable

Currently the `ChunkManifest` is hardcoded to use numpy arrays underneath to store the paths/offsets/byte ranges. However there are a few cases where we might want to use another format: -...

enhancement

internals

References formats comparison

Compare Kerchunk JSON/Parquet/Icechunk

documentation

references formats

Use lithops RetryingFunctionExecutor to automatically retry tasks

- [ ] Closes #xxxx - [ ] Tests added - [ ] Tests passing - [ ] Full type hint coverage - [ ] Changes are documented in `docs/releases.rst`...

enhancement

Add `.nrefs` method to accessor that returns number of virtual chunk references

It would be neat to add a method `vds.nrefs()` to the accessor, which returns the total number of virtual references in the dataset. It's useful to know how many millions...

enhancement

good first issue

Publicly expose icechunk's `validate_containers` kwarg

Icechunk now has a `validate_containers` kwarg to `.set_virtual_ref`/`.set_virtual_refs`. We should publicly expose this in `.virtualize.to_icechunk()`. I think we want to make `True` the default and expose it for overriding. The...

enhancement

Icechunk :ice_cube:

Fragility of url auto-parsing logic

We have code to automatically interpret a URI as being a local file, S3-compatible store, or HTTPStore using Obstore. https://github.com/zarr-developers/VirtualiZarr/blob/0bf830e63bc5bdedd87cd824d064071600d5b44a/virtualizarr/manifests/store.py#L142 But it feels a bit fragile and not really core...

Support passing configuration options to default_object_store

Encountered while working on #557 ```python vz.open_virtual_dataset( 's3://cworthy/oae-efficiency-atlas/data/experiments/000/01/alk-forcing.000-1999-01.pop.h.0347-01.nc', loadable_variables=[], decode_times=False, reader_options={'storage_options': {'anon': True, 'endpoint_url': 'https://data.source.coop/'}}, ) ``` ```python --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File :1 File ~/Documents/Work/Code/VirtualiZarr/virtualizarr/backend.py:351, in...

bug

Redefine Virtual Readers as `func(filepath) -> ManifestStore`

Once we've completed https://github.com/zarr-developers/VirtualiZarr/issues/473, the responsibility of each reader will essentially be reduced to just creating a fully virtual `ManifestStore` instance from the given filepath. This means that we could...

internals

readers

Make xarray an optional dependency?

I realised that the in-progess `ManifestStore` refactor would actually allow us to separate concerns so much that we could potentially make xarray an optional dependency, where you only need xarray...

dependencies

internals

open_virtual_mfdataset

Should we add a top-level function `open_virtual_mfdataset`? Though I would like to be more confident about the best way to parallelize reference generation first https://github.com/zarr-developers/VirtualiZarr/issues/123 https://github.com/zarr-developers/VirtualiZarr/issues/7

enhancement