xarray
xarray copied to clipboard
New alignment option: "exact" without broadcasting OR Turn off automatic broadcasting
Is your feature request related to a problem?
If we have two objects with dims x
and x1
, then xr.align(..., join="exact")
will pass because these dimensions are broadcastable.
I'd like a stricter option (join="strict"
?) that disallows broadcasting.
Describe the solution you'd like
xr.align(
xr.DataArray([1], dims="x"),
xr.DataArray([1], dims="x1"),
join="strict",
)
would raise an error.
It'd be nice to have this as a built-in option so we can use
with xr.set_options(arithmetic_join="strict"):
...
Describe alternatives you've considered
An alternative would be to allow control over automatic broadcasting through the set_options
context manager., but that seems like it would be more complicated to implement.
Additional context
This turns up in staggered grid calculations with xgcm where it is easy to mistakenly construct very high-dimensional arrays because of automatic broadcasting.
This turns up in staggered grid calculations with xgcm where it is easy to mistakenly construct very high-dimensional arrays because of automatic broadcasting.
Do you have an example that illustrates this issue?
I'm curious to see if in that specific case using a custom index for staggered grid coordinates wouldn't work as an alternative solution. The alignment rules are pretty strict (same index type + same sequence of coordinate names & dims). Not 100% sure if that applies in the xgcm case, but in theory it would raise an error regardless of the join method if you try to align objects that have broadcastable coordinates with incompatible indexes.
I'm curious to see if in that specific case using a custom index for staggered grid coordinates wouldn't work as an alternative solution. The alignment rules are pretty strict (same index type + same sequence of coordinate names & dims). Not 100% sure if that applies in the xgcm case, but in theory it would raise an error regardless of the join method if you try to align objects that have broadcastable coordinates with incompatible indexes.
This is on my to-do list to think about really hard :sweat_smile:
I think one problem would be that sometimes a different grid position implies a different length coordinate array, e.g. "outer" vs "center" is larger by one element.
Basically I want the following to raise an error because I've picked the wrong variable with dimension x1
instead of dimension x
by mistake
xr.DataArray([1], dims="x") / xr.DataArray([1], dims="x1")
I think we could do it with a "staggered grid index", but it seems useful in general
Ah ok I see, thanks! Yes I agree this may be useful.
@dcherian — I'm discussing this with @etienneschalk over at #8698
Can we clarify your proposal? Is it:
- All dims must match — i.e.
ds1.dims == ds2.dims
, as well as each dim being the same size as its counterpart? - ...or something more muted, like each dim being the same size as its counterpart, and not broadcasting a dimension of size 1 to the same-named dimension of size
n
?
The first seems very strict, I haven't hit that personally, but possibly others have...
Thoughts from the meeting:
"join"
is about indexes, and alignment and broadcasting should be orthogonal ops.
So instead of a new join
option, we'll add an option to control the broadcasting. So something like:
-
align(..., join='exact', broadcast=False)
# new kwargbroadcast
- And a new global option
arithmetic_broadcast
to complementarithmetic_join
. This will control automatic broadcasting in binary ops.
Hello @dcherian
I adapted changes made in #8698 with the added clear separation between the join
and broadcast
keyword parameters.
So while the form (API) changed, I did not change the current substance, and it is still not entirely clear for me, what is expected to implement this issue successfully. I wrote a small text following the discussions with @max-sixty about the changes I currently made, and my current understanding: https://github.com/pydata/xarray/pull/8698#issuecomment-1936982530 ; is this aligned with what you would expect? (pun intended... 😅)
You can have a look at test_broadcast_on_vs_off_global_option in the PR, testing the behaviour you described in your comment
@etienneschalk I sincerely apologized. I deeply misunderstood what align
was doing when I wrote this issue. align
isn't actually broadcasting, it is simply checking indexes (as it should).
Given the discussion in the meeting, I think the simplest solution is to raise an error here if arithmetic_broadcast is False
:
https://github.com/pydata/xarray/blob/ff0d056ec9e99dece0202d8d73c1cb8c6c20b2a1/xarray/core/variable.py#L2265-L2268
This should be a significantly smaller and easier PR.
Again, apologies for leading you down the wrong path.
No problem, even if the PR is not solving what was expected, it was the opportunity for me to dig into the xarray internal logic which is valuable learning!
I can try to implement your suggestion, reusing the existing 'arithmetic_broadcast' flag