xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Opt out of auto creating index variables

Open TomNicholas opened this issue 1 year ago • 8 comments

Tries fixing #8704 by cherry-picking from #8124 as @benbovy suggested in https://github.com/pydata/xarray/issues/8704#issuecomment-1926868422

  • [x] Closes #8704
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

TomNicholas avatar Feb 05 '24 22:02 TomNicholas

This currently doesn't work - when I run test_coordinates.py I get some failures:

FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_init_default_index - AssertionError: {'x': <class 'xarray.core.variable.Variable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_getitem - AssertionError: {'x': <class 'xarray.core.variable.Variable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_assign - AssertionError: {'x': <class 'xarray.core.variable.Variable'>, 'y': <class 'xarray.core.variable.IndexVariable'>}
FAILED xarray/tests/test_coordinates.py::TestCoordinates::test_align - ValueError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3}

The first couple of errors seem to be because the 1D variable doesn't have a corresponding index created by default.

I noticed that the local variable default_indexes is being defined within Coordinates.__init__ as an empty dict, which is never changed before being used to update indexes. This seems like an indication of some missing logic to populate default_indexes?

https://github.com/pydata/xarray/pull/8711/files#diff-c73f63e529ab4065cca1aa23d93c91a3b060f314e2e498b708b90fef87141a88R286

TomNicholas avatar Feb 05 '24 22:02 TomNicholas

Sorry the two commits picked from #8124 where done before #8107 and messed up things a bit. I also picked and added the last merge commit from #8124, which should hopefully make things better (coordinate tests are all running fine locally at least).

benbovy avatar Feb 06 '24 12:02 benbovy

About the auto_convert option added to as_variable() here: this is a temporary workaround before we make a deeper and better refactoring of non-index vs. index coordinate variables. That is not super nice since as_variable() is public API, but maybe that's fine if it is clearly stated in the docstrings that auto_convert is for internal use only (and we likely be removed in the future)?

benbovy avatar Feb 06 '24 12:02 benbovy

Thanks @benbovy !

That is not super nice since as_variable() is public API, but maybe that's fine if it is clearly stated in the docstrings that auto_convert is for internal use only (and we likely be removed in the future)?

I mean I don't think that's a huge deal, but if you definitely wanted to hide it from users couldn't you make a private _as_variable(..., auto_convert) and use that for internal calls and have the public as_variable call the private one?

TomNicholas avatar Feb 06 '24 17:02 TomNicholas

Why is as_variable public API?

dcherian avatar Feb 06 '24 17:02 dcherian

why is as_variable public API?

More context in #1303

benbovy avatar Feb 06 '24 17:02 benbovy

I mean I don't think that's a huge deal

Me neither (I wanted to see what others think).

@TomNicholas have you tried running your notebook mentioned in https://github.com/pydata/xarray/issues/8704#issuecomment-1927264039 with this PR? Does it fix the errors raised?

benbovy avatar Feb 07 '24 08:02 benbovy

@TomNicholas have you tried running your notebook mentioned in https://github.com/pydata/xarray/issues/8704#issuecomment-1927264039 with this PR? Does it fix the errors raised?

Yes I did, and I think it does! I still can't do a full xr.concat on Dataset objects because of a different bug (see https://github.com/pydata/xarray/pull/8714), but with this PR gets to the same point as v2023.08.0 does at least. 😀

TomNicholas avatar Feb 07 '24 14:02 TomNicholas

@benbovy I would really like to get this merged as I'm using it in a new package (see https://github.com/TomNicholas/VirtualiZarr/pull/42), but I'm not really sure what the steps to make this PR ready for merging are. Does this need any new tests?

TomNicholas avatar Mar 19 '24 16:03 TomNicholas

Can we catch these warnings if they are expected?


xarray/tests/test_variable.py::TestVariable::test_as_variable
  /home/runner/work/xarray/xarray/xarray/tests/test_variable.py:1219: FutureWarning: variable 'x' with name matching its dimension will not be automatically converted into an `IndexVariable` object in the future.
    actual = as_variable(data, name="x")

xarray/tests/test_variable.py::TestVariable::test_as_variable
  /home/runner/work/xarray/xarray/xarray/tests/test_variable.py:1237: FutureWarning: variable 'time' with name matching its dimension will not be automatically converted into an `IndexVariable` object in the future.
    assert as_variable(dt, "time").dtype.kind == "M"

xarray/tests/test_variable.py::TestVariable::test_as_variable
  /home/runner/work/xarray/xarray/xarray/tests/test_variable.py:1239: FutureWarning: variable 'time' with name matching its dimension will not be automatically converted into an `IndexVariable` object in the future.
    assert as_variable(td, "time").dtype.kind == "m"

dcherian avatar Mar 22 '24 21:03 dcherian

This should be done now!

Can we catch these warnings if they are expected?

Done

TomNicholas avatar Mar 22 '24 22:03 TomNicholas