xarray Add invert option to DataArray/Dataset.stack()

trafficstars

This brings in the option to stack all dimensions except for one or more dimensions listed. I find this very useful for quickly iterating over all the combinations of dimensions except for one (i.e. you have a number of input parameters that are parameterized and one time dimension, and you want to calculate some time response for all the combinations of these input parameters and store them in the time-row corresponding to the appropriate combination of inputs).

I played around with implementing to_stacked_array() for DataArray, but this made less sense in the end since that method was really designed for Datasets.

[x] Addresses #8278
[ ] Tests added (added one for DataArray, just now realizing I should have one for Dataset)
[ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
[ ] New functions/methods are listed in api.rst

Oct 18 '23 13:10 carschandler

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. If you have questions, some answers may be found in our contributing guidelines.

Oct 18 '23 13:10 welcome[bot]

Yeah perhaps group_over is the way: https://github.com/pydata/xarray/issues/324

Oct 20 '23 17:10 dcherian

Yeah perhaps group_over is the way: #324

I added a note to my comment above caveating my .groupby suggestion. .groupby has different semantics from "excluding these dimensions" when the indexes aren't unique, and so is not so great a suggestion. .group_over could work, though would be a bigger effort.

Currently ranked vote:

Offer ~xr.ALL_DIMS.exclude("foo")~ edit: since it needs to be an instance, would need to be xr.ALL_DIMS(exclude='foo') — it wouldn't be too difficult, since the code could be added to the infix_dims func and it would work everywhere. I don't love it, but it doesn't intrude on existing APIs
Do nothing — a shame, but a list comp is indeed not too bad. Could wait for .group_over
exclude_dims arg to .stack
invert

Oct 20 '23 17:10 max-sixty

Yeah perhaps group_over is the way: https://github.com/pydata/xarray/issues/324

This is a good reference. Thanks!

Oct 20 '23 17:10 carschandler

Sorry @carschandler if you feel that my https://github.com/pydata/xarray/pull/8332#discussion_r1366593050 was dismissive, it wasn't intended to be so. I definitely don't want to discourage anyone making suggestions / contributions to improve Xarray, and I don't think that any suggestion would obscure the project!

I just wanted to say that in this specific case the addition of an argument like invert or exclude_dims, while it would certainly make some cases a bit more convenient and succinct, is problematic in several ways. Not only regarding the order of the dimensions of a Dataset, but also on how to handle the case where multiple stacked dimensions are given as kwargs to stack. I'm sure we could come up with some basic rules to easily handle those cases (e.g., do not allow multiple stack dim kwargs) but is it worth the trouble? About the ugliness of all other ways of achieving it, I respectfully disagree. While a list comp is slightly more verbose, it is easy to understand. At a first glance the .group_over or xr.ALL_DIMS-based alternatives look also good IMHO.

Oct 21 '23 13:10 benbovy

@benbovy oh no not at all! Hope you didn't think I had taken offense after reading my response. I did genuinely appreciate your feedback and fully understand your sentiment. I understand your concerns about the invert and exclude_dims options, and your point about the list comprehension is valid as well. I agree that pursuing one of the other alternatives mentioned is probably better. At the end of the day, I just think it would be nice to be able to quickly iterate over rows across one (or more) dimension(s); whether it's in stack or some other method is less important.

Oct 23 '23 13:10 carschandler

Thanks @carschandler ! Very much appreciate your attitude and approach here!

What do folks think about xr.ALL_DIMS(exclude=['foo', 'bar'])?

To do this, we would make xr.ALL_DIMS into a small class and store exclusions. And then in infix_dims, we'd check for the class and materialize them based on the current dims.

I don't love it — it's not that much cleaner than [d for d in ds.dims if d not in ['foo', 'bar']]. But it's also not intrusive, doesn't require widespread changes or maintenance — I don't see much downside (maybe another item on the Dims type definition; though maybe not since it's Hashable...)

Oct 23 '23 19:10 max-sixty

@max-sixty I agree with all your points. Not as concise as it could be in group_over, but I do think that the syntax is still marginally cleaner than the list comprehension, and I think its meaning is more quickly conveyed to the reader. It also provides a canonical way to get all dims except one, and I always appreciate when a project gives me a way to do something; gives me more confidence in my code. On the other hand, it's nice to have pure equality between ... and ALL_DIMS, but I don't think that that's a good enough reason to reject it.

If it is much work, then I would say it's not worth worrying about it and that list comprehension (or even da.to_dataset().to_stacked_array()) would work for now until group_over is ready. If it's a simple change, I think it's a nice convenience.

Oct 23 '23 19:10 carschandler

One other thought is that when doing chained operations, list comprehensions can become cumbersome if what you want to stack is actually some indexed version of an array. Of course you could assign to another array intermediately, but once again this is clunkier and may not always create a view depending on your indexing.

Oct 24 '23 18:10 carschandler

xarray xarray copied to clipboard

Add invert option to DataArray/Dataset.stack()

xarray
xarray copied to clipboard