Error when adding a DataArray to an existing Dataset with a MultiIndex
What is your issue?
This is a mixture between question, bug (potentially) and general issue, so feel free to label it accordingly.
Here is my question: what is the recommended approach to add a xr.DataArray to an existing xr.Dataset with a MultiIndex?
To give some more context, I've a xarray.Dataset called market with several variables and coordinates, one of them, timeslice, a MultiIndex. This is what it looks like:
<xarray.Dataset>
Dimensions: (region: 1, commodity: 6, timeslice: 6, year: 8)
Coordinates:
* region (region) object 'R1'
* commodity (commodity) object 'electricity' 'gas' ... 'CO2f' 'wind'
units_prices (commodity) object 'MUS$2010/GWh' ... 'MUS$2010/kt'
* timeslice (timeslice) object MultiIndex
* month (timeslice) object 'all-year' 'all-year' ... 'all-year'
* day (timeslice) object 'all-week' 'all-week' ... 'all-week'
* hour (timeslice) object 'night' 'morning' ... 'late-peak' 'evening'
* year (year) int64 2020 2025 2030 2035 2040 2045 2050 2055
Data variables:
prices (commodity, region, year, timeslice) float64 0.0702 ... 0.0
exports (commodity, region, year, timeslice) float64 0.0 0.0 ... 0.0
imports (timeslice, commodity, region, year) float64 0.0 0.0 ... 0.0
static_trade (timeslice, commodity, region, year) float64 0.0 0.0 ... 0.0
Now, I want to add another variable, called supply, identical to exports but filled with zeros. In a code that was working with xarray==2022.3.0 and pandas==1.4.4, I was simply doing:
market["supply"] = xr.zeros_like(market.exports)
And it worked totally fine. With the newest versions of xarray==2023.5.0 and pandas==2.0.2 under python 3.10, this fails with:
*** DeprecationWarning: Deleting a single level of a MultiIndex is deprecated. Previously, this deleted all levels of a MultiIndex. Please also drop the following variables: {'timeslice'} to avoid an error in the future.
I've tried variants like:
market["supply"] = market.exports * 0
market = market.assign(supply = zeros_like(market.exports))
both failing with the same message.
I totally fail to see how this process is deleting a level of the MultiIndex - or modifying the indexes in any form. Probably it is because I don't understand the inner workings of xarray indexes.
The following works totally fine, but it is rather convoluted having to create a brand new Dataset from scratch manually, in addition to be problematic if you really want to modify the Dataset in place (same problem will have assign).
vars = dict(market.data_vars)
vars["supply"] = xr.zeros_like(market.exports)
market = xr.Dataset(vars)
Resulting in:
<xarray.Dataset>
Dimensions: (region: 1, commodity: 6, timeslice: 6, year: 8)
Coordinates:
* region (region) object 'R1'
* commodity (commodity) object 'electricity' 'gas' ... 'CO2f' 'wind'
units_prices (commodity) object 'MUS$2010/GWh' ... 'MUS$2010/kt'
* timeslice (timeslice) object MultiIndex
* month (timeslice) object 'all-year' 'all-year' ... 'all-year'
* day (timeslice) object 'all-week' 'all-week' ... 'all-week'
* hour (timeslice) object 'night' 'morning' ... 'late-peak' 'evening'
* year (year) int64 2020 2025 2030 2035 2040 2045 2050 2055
Data variables:
prices (commodity, region, year, timeslice) float64 0.0702 ... 0.0
exports (commodity, region, year, timeslice) float64 0.0 0.0 ... 0.0
imports (timeslice, commodity, region, year) float64 0.0 0.0 ... 0.0
static_trade (timeslice, commodity, region, year) float64 0.0 0.0 ... 0.0
supply (commodity, region, year, timeslice) float64 0.0 0.0 ... 0.0
Many thanks for your support!
Thanks for taking the time to file a bug report!
I totally fail to see how this process is deleting a level of the MultiIndex - or modifying the indexes in any form. Probably it is because I don't understand the inner workings of xarray indexes.
I agree this is confusing and seems like it should work.
@dalonsoa It would be great if you could provide a MCVE here. It makes it much easier to debug for interested parties.
Hi @kmuehlbauer , many thanks for asking for a MCVE because, to be honest, I'm not able to reproduce the error with the following code which, I think represents the situation we have at hand. It runs beginning to end without any problem, using the same versions for xarray and pandas:
import xarray as xr
import numpy as np
da1 = xr.DataArray(
np.arange(48).reshape(2, 2, 3, 4),
coords=[
("v", [10, 20]),
("x", ["a", "b"]),
("y", [0, 1, 2]),
("z", ["alpha", "beta", "gamma", "delta"]),
],
)
da1 = da1.stack(w=("x", "z", "v"))
da2 = xr.zeros_like(da1.transpose("w", "y"))
da3 = xr.zeros_like(da1)
ds = xr.Dataset({"one": da1, "two": da2, "three": da3})
ds["four"] = xr.zeros_like(ds.one)
print(ds)
I'll investigate why my code is failing and this one is not. May it be the way the MultIndex is being created... 🤔 ?
If anyone is interested, this is the line of the code I'm refactoring that is causing me trouble: https://github.com/SGIModel/MUSE_OS/blob/9fb62bc0c3b7adeb9ce89dce9cad4856e1082925/src/muse/examples.py#L193
@dalonsoa Thanks for coming back this fast. I've also no real clue where the problem lies. It might be how the MultiIndex is created, as you are suggesting.
I've had a look at the tests over at your place to get an impression how things are about to work. But there are too many fixtures to quickly adapt a MCVE from that, at least for one who is not familiar with the code base. Would you be able to destill a MCVE from your test code?
Mmm... the code is rather convoluted - trying to simplify it - but I'll try to put something simple together that uses parts of the original code and reproduces the error. Bear with me while I do that.
I've have not forgotten about this. I've tracked where and how the timeslice MultiIndex is created and created another example that closely matches that (see below), but that one also works...
The problem I have is that the process looks like:
1.timeslice MultiIndex is created using pd.MultiIndex.from_tuples.
2. A lot of stuff happens now, but timeslice remains the same... in principle.
3. The program finally fails when doing market["supply"] = zeros_like(market.exports) as described above.
So I'll keep investigating what's going on in step 2 that makes things break down the line.
import pandas as pd
import xarray as xr
import numpy as np
timeslices = {
"all-year.all-week.night": 1460,
"all-year.all-week.morning": 1460,
"all-year.all-week.afternoon": 1460,
"all-year.all-week.early-peak": 1460,
"all-year.all-week.late-peak": 1460,
"all-year.all-week.evening": 1460,
}
level_names = ["month", "day", "hour"]
levels = [tuple(k.split(".")) for k in timeslices.keys()]
values = list(timeslices.values())
indices = pd.MultiIndex.from_tuples(levels, names=level_names)
timeslice = xr.DataArray(values, coords={"timeslice": indices}, dims="timeslice")
da1 = xr.DataArray(
np.arange(36).reshape(2, 3, 6),
coords=[
("x", ["a", "b"]),
("y", [0, 1, 2]),
timeslice.timeslice,
],
)
da2 = xr.zeros_like(da1.transpose("y", "x", ...))
da3 = xr.zeros_like(da1)
ds = xr.Dataset({"one": da1, "two": da2, "three": da3})
ds["four"] = xr.zeros_like(ds.one)
print(ds)
For reference, I've narrowed down the problem to this function. The manipulations going on there result in a DataArray with a MultiIndex coordinate that misbehaves. The docstring of that function is quite thorough in case anyone is curious about that it is doing.
Ok, while trying to figure out what's wrong with my code above I'm finding examples that have an odd behaviour or that fail, but for a different reason.
Let's take the last example but where the MultiIndex is added by expanding the dimensions of da1 instead of when creating it.
# as above until here
# ...
da1 = xr.DataArray(
np.arange(6).reshape(2, 3),
coords=[
("x", ["a", "b"]),
("y", [0, 1, 2]),
],
)
da1 = da1.expand_dims(dim={"timeslice": timeslice.timeslice})
print(da1)
This does not add the MultiIndex coordinate resulting in a similar array as above, as I was expecting, but in the following odd-looking coordinate:
<xarray.DataArray (timeslice: 6, x: 2, y: 3)>
array([[[0, 1, 2],
[3, 4, 5]],
...
[[0, 1, 2],
[3, 4, 5]]])
Coordinates:
* timeslice (timeslice) object ('all-year', 'all-week', 'night') ... ('all...
* x (x) <U1 'a' 'b'
* y (y) int64 0 1 2
To get the proper MultIndex coordinate, I need to assign it explicitely:
da1 = da1.expand_dims(dim={"timeslice": timeslice.timeslice}).assign_coords(timeslice=timeslice.timeslice)
print(da1)
Resulting in:
<xarray.DataArray (timeslice: 6, x: 2, y: 3)>
array([[[0, 1, 2],
[3, 4, 5]],
...
[[0, 1, 2],
[3, 4, 5]]])
Coordinates:
* timeslice (timeslice) object MultiIndex
* x (x) <U1 'a' 'b'
* y (y) int64 0 1 2
* month (timeslice) object 'all-year' 'all-year' ... 'all-year'
* day (timeslice) object 'all-week' 'all-week' ... 'all-week'
* hour (timeslice) object 'night' 'morning' ... 'late-peak' 'evening'
One would think this should be a perfectly fine DataArray, but when I do either of these things:
ds = xr.Dataset({"one": da1})
ds["four"] = xr.zeros_like(ds.one)
or
ds = xr.Dataset({"one": da1, "two": xr.zeros_like(da1)})
Things fail with:
ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'timeslice' (2 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions
This is not the error I was originally reporting, but goes along the same lines of having a perfectly looking array with a MultiIndex coordinate that misbehaves.
I will keep trying to reproduce the original error, but any suggestion of why this might be happening with an otherwise perfectly looking array will be helpful.
@dalonsoa the examples in your last comment are working now with #8094, i.e.,
ds = xr.Dataset({"one": da1})
ds["four"] = xr.zeros_like(ds.one)
and
ds = xr.Dataset({"one": da1, "two": xr.zeros_like(da1)})
Could you confirm that #8094 also solves your original issue?
@benbovy , many thanks for the fix. I was on holiday. I'll check if the original issue was also fixed by this as soon as possible, but it is great that, if nothing else, at least part of it is sorted. I'll keep you posted in case it has not been fixed.
Fyi I just checked with xarray 2025.10.0 and the following code still raises the same warning. You might want to consider reopening this issue.
import pandas as pd import xarray as xr import numpy as np
timeslices = { "all-year.all-week.night": 1460, "all-year.all-week.morning": 1460, "all-year.all-week.afternoon": 1460, "all-year.all-week.early-peak": 1460, "all-year.all-week.late-peak": 1460, "all-year.all-week.evening": 1460, } level_names = ["month", "day", "hour"]
levels = [tuple(k.split(".")) for k in timeslices.keys()] values = list(timeslices.values())
indices = pd.MultiIndex.from_tuples(levels, names=level_names) timeslice = xr.DataArray(values, coords={"timeslice": indices}, dims="timeslice")
da1 = xr.DataArray( np.arange(36).reshape(2, 3, 6), coords=[ ("x", ["a", "b"]), ("y", [0, 1, 2]), timeslice.timeslice, ], )
da2 = xr.zeros_like(da1.transpose("y", "x", ...)) da3 = xr.zeros_like(da1)
ds = xr.Dataset({"one": da1, "two": da2, "three": da3}) ds["four"] = xr.zeros_like(ds.one) print(ds)
Thanks for reopening! Here's also a more succinct code example that raises the warning in case it's helpful:
import pandas as pd
import xarray as xr
da = xr.DataArray([0], coords=[("x", pd.MultiIndex.from_tuples([(0, 0)]))])
ds = xr.Dataset({"one": da})
ds["two"] = xr.zeros_like(ds.one)