Support passing configuration options to default_object_store
Encountered while working on #557
vz.open_virtual_dataset(
's3://cworthy/oae-efficiency-atlas/data/experiments/000/01/alk-forcing.000-1999-01.pop.h.0347-01.nc',
loadable_variables=[],
decode_times=False,
reader_options={'storage_options': {'anon': True, 'endpoint_url': 'https://data.source.coop/'}},
)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File <timed exec>:1
File ~/Documents/Work/Code/VirtualiZarr/virtualizarr/backend.py:351, in open_virtual_mfdataset(paths, concat_dim, compat, preprocess, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
347 executor = get_executor(parallel=parallel)
348 with executor() as exec:
349 # wait for all the workers to finish, and send their resulting virtual datasets back to the client for concatenation there
350 virtual_datasets = list(
--> 351 exec.map(
352 open_func,
353 paths1d,
354 )
355 )
357 # TODO add file closers
358
359 # Combine all datasets, closing them in case of a ValueError
360 try:
File ~/Documents/Work/Code/VirtualiZarr/virtualizarr/parallel.py:310, in LithopsEagerFunctionExecutor.map(self, fn, timeout, chunksize, *iterables)
307 fexec = lithops.FunctionExecutor()
309 futures = fexec.map(fn, *iterables)
--> 310 results = fexec.get_result(futures)
312 return results
...
File /usr/local/lib/python3.12/site-packages/virtualizarr/manifests/store.py:134, in _find_bucket_region()
File /usr/local/lib/python3.12/site-packages/requests/structures.py:52, in __getitem__()
KeyError: 'x-amz-bucket-region'
From @sharkinsspatial :
This is a case where as @Max outlined in https://github.com/zarr-developers/VirtualiZarr/issues/553 we are caught in a transition period where the open_virtual_dataset signature no longer aligns with the internals. In this case you would want to an S3Store pre-constructed using the aws_endpoint argument pointing to the 'https://data.source.coop/ proxy endpoint into a store arg for open_virtual_dataset . So we will need some way for users to inject there own obstore for whatever function signature we decide on.
In addition to what Sean said, https://github.com/zarr-developers/VirtualiZarr/pull/558 at least gives you an informative error message for what went wrong.
We also probably want to encourage backends to flag if they receive options that they do not use. In this case the newer HDF5 reader does not use storage_options.
I think from the recent conversation in the slack, people would rather a way to pass configuration options into default_object_store() over explicitly providing a store. @TomNicholas are you working on this? Otherwise I could open a PR, but happy to leave it to you if you're on it
I was about to have a go, but haven't started yet!
I was about to have a go, but haven't started yet!
Nice, I'll provide a review then. Thanks for working on it
It turns out that this works totally fine without any changes to the code
vds = vz.open_virtual_dataset(
"https://data.source.coop/cworthy/oae-efficiency-atlas/data/polygon_masks.nc",
backend=HDFVirtualBackend,
)
the reason being that under the hood passing the URL in this form creates an obstore HTTPStore instead of an S3Store like this
store = obs.store.HTTPStore.from_url("https://data.source.coop/")
then
reader = ObstoreReader(store=store, path="https://data.source.coop/cworthy/oae-efficiency-atlas/data/polygon_masks.nc")
which works fine with no further configuration options passed.
I'm going to close #560 for now as not necessary (but not close this issue as this use case should at least be documented).
Completed by #601