xarray icon indicating copy to clipboard operation
xarray copied to clipboard

New alignment option: "exact" without broadcasting OR Turn off automatic broadcasting

Open dcherian opened this issue 1 year ago • 4 comments

Is your feature request related to a problem?

If we have two objects with dims x and x1, then xr.align(..., join="exact") will pass because these dimensions are broadcastable.

I'd like a stricter option (join="strict"?) that disallows broadcasting.

Describe the solution you'd like

xr.align(
    xr.DataArray([1], dims="x"),
    xr.DataArray([1], dims="x1"),
    join="strict",
)

would raise an error.

It'd be nice to have this as a built-in option so we can use

with xr.set_options(arithmetic_join="strict"):
    ...

Describe alternatives you've considered

An alternative would be to allow control over automatic broadcasting through the set_options context manager., but that seems like it would be more complicated to implement.

Additional context

This turns up in staggered grid calculations with xgcm where it is easy to mistakenly construct very high-dimensional arrays because of automatic broadcasting.

dcherian avatar Jul 18 '22 18:07 dcherian

This turns up in staggered grid calculations with xgcm where it is easy to mistakenly construct very high-dimensional arrays because of automatic broadcasting.

Do you have an example that illustrates this issue?

I'm curious to see if in that specific case using a custom index for staggered grid coordinates wouldn't work as an alternative solution. The alignment rules are pretty strict (same index type + same sequence of coordinate names & dims). Not 100% sure if that applies in the xgcm case, but in theory it would raise an error regardless of the join method if you try to align objects that have broadcastable coordinates with incompatible indexes.

benbovy avatar Jul 18 '22 20:07 benbovy

I'm curious to see if in that specific case using a custom index for staggered grid coordinates wouldn't work as an alternative solution. The alignment rules are pretty strict (same index type + same sequence of coordinate names & dims). Not 100% sure if that applies in the xgcm case, but in theory it would raise an error regardless of the join method if you try to align objects that have broadcastable coordinates with incompatible indexes.

This is on my to-do list to think about really hard :sweat_smile:

I think one problem would be that sometimes a different grid position implies a different length coordinate array, e.g. "outer" vs "center" is larger by one element.

TomNicholas avatar Jul 18 '22 21:07 TomNicholas

Basically I want the following to raise an error because I've picked the wrong variable with dimension x1 instead of dimension x by mistake

xr.DataArray([1], dims="x") / xr.DataArray([1], dims="x1")

I think we could do it with a "staggered grid index", but it seems useful in general

dcherian avatar Jul 18 '22 21:07 dcherian

Ah ok I see, thanks! Yes I agree this may be useful.

benbovy avatar Jul 19 '22 12:07 benbovy

@dcherian — I'm discussing this with @etienneschalk over at #8698

Can we clarify your proposal? Is it:

  1. All dims must match — i.e. ds1.dims == ds2.dims, as well as each dim being the same size as its counterpart?
  2. ...or something more muted, like each dim being the same size as its counterpart, and not broadcasting a dimension of size 1 to the same-named dimension of size n?

The first seems very strict, I haven't hit that personally, but possibly others have...

max-sixty avatar Feb 10 '24 18:02 max-sixty

Thoughts from the meeting:

"join" is about indexes, and alignment and broadcasting should be orthogonal ops.

So instead of a new join option, we'll add an option to control the broadcasting. So something like:

  1. align(..., join='exact', broadcast=False) # new kwarg broadcast
  2. And a new global option arithmetic_broadcast to complement arithmetic_join . This will control automatic broadcasting in binary ops.

dcherian avatar Feb 14 '24 17:02 dcherian

Hello @dcherian

I adapted changes made in #8698 with the added clear separation between the join and broadcast keyword parameters.

So while the form (API) changed, I did not change the current substance, and it is still not entirely clear for me, what is expected to implement this issue successfully. I wrote a small text following the discussions with @max-sixty about the changes I currently made, and my current understanding: https://github.com/pydata/xarray/pull/8698#issuecomment-1936982530 ; is this aligned with what you would expect? (pun intended... 😅)

You can have a look at test_broadcast_on_vs_off_global_option in the PR, testing the behaviour you described in your comment

etienneschalk avatar Feb 17 '24 14:02 etienneschalk

@etienneschalk I sincerely apologized. I deeply misunderstood what align was doing when I wrote this issue. align isn't actually broadcasting, it is simply checking indexes (as it should).

Given the discussion in the meeting, I think the simplest solution is to raise an error here if arithmetic_broadcast is False: https://github.com/pydata/xarray/blob/ff0d056ec9e99dece0202d8d73c1cb8c6c20b2a1/xarray/core/variable.py#L2265-L2268

This should be a significantly smaller and easier PR.

Again, apologies for leading you down the wrong path.

dcherian avatar Feb 18 '24 22:02 dcherian

No problem, even if the PR is not solving what was expected, it was the opportunity for me to dig into the xarray internal logic which is valuable learning!

I can try to implement your suggestion, reusing the existing 'arithmetic_broadcast' flag

etienneschalk avatar Feb 19 '24 21:02 etienneschalk