cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

Decoder for MultiIndexes fails if there are other variables, using a dimension which is part of the multiindex

Open okz opened this issue 11 months ago • 0 comments

First, thank you so much. Compression-by-gathering is an incredibly usefull addition, which hopefully will end up in xarray for ragged (or sparse) array support on netcdf's. one day.

#321 added support encoding and decoding for Pandas multi-indexes using "compression by gathering". However if there are other variables in the dataset using a dimension which is part of the multiindex, decode fails.

Minimum example, is a single line addition of var_with_lat , derived from the Encoding and decoding tutorial:

ds = xr.Dataset(
    {"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
    {
        "landpoint": pd.MultiIndex.from_product(
            [["a", "b"], [1, 2]], names=("lat", "lon")
        )
    },
)

# ADDING THIS LINE WILL FAIL THE DECODING PROCESS. 
# ds["var_with_lat"] = xr.DataArray([1,2], dims="lat")

encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

Once var_with_lat is added, decoding fails:

---> [129](file:///home/mirico/git/Curvefit/tests/scratch%20copy.py?line=128) decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

File [~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116), in decode_compress_to_multi_index(encoded, idxnames)
    [110](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=109)     from xarray.indexes import PandasMultiIndex
    [112](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=111)     variables = {
    [113](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=112)         dim: encoded[dim].isel({dim: xr.Variable(data=index, dims=idxname)})
    [114](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=113)         for dim, index in zip(names, indices)
    [115](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=114)     }
--> [116](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=115)     decoded = decoded.assign_coords(variables).set_xindex(
    [117](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=116)         names, PandasMultiIndex
    [118](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=117)     )
    [119](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=118) except ImportError:
    [120](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=119)     arrays = [encoded[dim].data[index] for dim, index in zip(names, indices)]

File [~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330), in Dataset.set_xindex(self, coord_names, index_cls, **options)
   [4327](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4326) indexed_coords = set(coord_names) & set(self._indexes)
   [4329](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4328) if indexed_coords:
-> [4330](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4329)     raise ValueError(
   [4331](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4330)         f"those coordinates already have an index: {indexed_coords}"
   [4332](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4331)     )
   [4334](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4333) coord_vars = {name: self._variables[name] for name in coord_names}
   [4336](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4335) index = index_cls.from_variables(coord_vars, options=options)

ValueError: those coordinates already have an index: {'lat'}

okz avatar Jul 30 '23 08:07 okz