Use obstore / obspec for globbing in `open_virtual_mfdataset`
Problem
xarray.open_mfdataset accepts a string with wildcards, and then uses this janky bit of fsspec code to glob with it.
But it's pretty fragile - in particular it confusingly raises if you try to use glob syntax with an s3 URL without using the xarray zarr backend.
VirtualiZarr currently imports that private internal to do the same kind of globbing, but VirtualiZarr doesn't even have backends in the same way, which is why I attempted to improve the situation upstream (see https://github.com/pydata/xarray/pull/9930).
Solution
However I realize now that a better way to improve xarray upstream might be to use obstore and obspec instead of fsspec, and make a robust internal utility in xarray (that doesn't raise a random exception for only one xarray backend) and which virtualizarr can safely import.
Therefore I think we should:
- vendor those internals into virtualizarr instead of importing them (soon because I think globbing remote urls from
open_virtual_mfdatasetis broken right now because of that exception), - iterate and improve them using obstore and obspec,
- eventually push the changes upstream so that xarray no longer needs fsspec for that.
cc the usual suspects @maxrjones @sharkinsspatial @kylebarron
EDIT: related to #568 too.
This seems great. We could even make a standalone utility for globbing. There are probably enough niche edge cases around how glob characters are interpreted.
Perhaps we should go ahead with an 0.1 release of obspec soon (and leave the question of exceptions for the future)?
We could even make a standalone utility for globbing.
fsspec.glob() is very useful.
(and leave the question of exceptions for the future)
The exceptions thing I mentioned is unrelated to obspec - that's just a quirk in the code Martin added to Xarray.
We could even make a standalone utility for globbing.
fsspec.glob()is very useful.
Perhaps this could go into a library like obspec-utils. That would provide a clear separation between the core library/protocol and extra provided functionality.
(and leave the question of exceptions for the future)
The exceptions thing I mentioned is unrelated to
obspec- that's just a quirk in the code Martin added to Xarray.
No I mean that obspec is currently "blocked" on trying to define core exceptions because exceptions only support nominal subtyping (subclassing) while obspec only supports structural subtyping (protocols).
But we can release an obspec with undefined exceptions, so people can do anything on top of obspec as long as they don't need a list of "permitted exceptions".
https://github.com/developmentseed/obspec/issues/12