xarray
xarray copied to clipboard
`"source"` encoding for datasets opened from `fsspec` objects
When opening files from path-like objects (str
, pathlib.Path
), the backend machinery (_dataset_from_backend_dataset
) sets the "source"
encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using fsspec
to open remote files.
In this PR, I'm extracting the path
attribute that most fsspec
objects have to set that value. I've considered using isinstance
checks instead of the getattr
-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within fsspec
itself).
If this sounds like a good idea, I'll update the documentation of the "source"
encoding to mention this feature.
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
Without knowing much (I generally ds.reset_encoding()
) it does sound like a good idea!
Shouldn't _normalize_path
or _find_absolute_paths
be able to handle this?
the main use case is indeed to extract additional data, which you'd do immediately after open_dataset
(after which you could drop the encoding).
Shouldn't
_normalize_path
or_find_absolute_paths
be able to handle this?
As far as I can tell, they only convert path-likes to string (which these objects are not, they are file-like, not path-like). Are you suggesting we should change that?
I think this is fine, but our long-term goal is to delete encoding
so you might consider a different solution to your problem.
my impression of that discussion was that we wanted to either return the encoding in a separate object, or somehow remove the encoding after the first operation (i.e. not carry it around). Either way would be fine with me, since I would still have access to it immediately after opening.
Would a dataset with this in encoding be round tripped without error? Would be good to test that
Would a dataset with this in encoding be round tripped without error? Would be good to test that
I'm not opposed to adding an explicit test (since I can't find any existing one right now), but if it would cause problems we'd also have those with string paths / urls – and those have been working just fine since long ago.
As far as I can tell, "source"
, as well as "original_shape"
, are dropped from the encoding before doing anything else (search for safe_to_drop
for where that happens).
Ah thanks. My mistake m I thought we were sticking in the fsspec object not just the path
as far as I can tell, we could write anything in that encoding (fsspec
objects, strings, or other things), and it would simply be ignored / dropped before writing.