xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Add invert option to DataArray/Dataset.stack()

Open carschandler opened this issue 2 years ago • 9 comments
trafficstars

This brings in the option to stack all dimensions except for one or more dimensions listed. I find this very useful for quickly iterating over all the combinations of dimensions except for one (i.e. you have a number of input parameters that are parameterized and one time dimension, and you want to calculate some time response for all the combinations of these input parameters and store them in the time-row corresponding to the appropriate combination of inputs).

I played around with implementing to_stacked_array() for DataArray, but this made less sense in the end since that method was really designed for Datasets.

  • [x] Addresses #8278
  • [ ] Tests added (added one for DataArray, just now realizing I should have one for Dataset)
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

carschandler avatar Oct 18 '23 13:10 carschandler

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. If you have questions, some answers may be found in our contributing guidelines.

welcome[bot] avatar Oct 18 '23 13:10 welcome[bot]

Yeah perhaps group_over is the way: https://github.com/pydata/xarray/issues/324

dcherian avatar Oct 20 '23 17:10 dcherian

Yeah perhaps group_over is the way: #324

I added a note to my comment above caveating my .groupby suggestion. .groupby has different semantics from "excluding these dimensions" when the indexes aren't unique, and so is not so great a suggestion. .group_over could work, though would be a bigger effort.

Currently ranked vote:

  • Offer ~xr.ALL_DIMS.exclude("foo")~ edit: since it needs to be an instance, would need to be xr.ALL_DIMS(exclude='foo') — it wouldn't be too difficult, since the code could be added to the infix_dims func and it would work everywhere. I don't love it, but it doesn't intrude on existing APIs
  • Do nothing — a shame, but a list comp is indeed not too bad. Could wait for .group_over
  • exclude_dims arg to .stack
  • invert

max-sixty avatar Oct 20 '23 17:10 max-sixty

Yeah perhaps group_over is the way: https://github.com/pydata/xarray/issues/324

This is a good reference. Thanks!

carschandler avatar Oct 20 '23 17:10 carschandler

Sorry @carschandler if you feel that my https://github.com/pydata/xarray/pull/8332#discussion_r1366593050 was dismissive, it wasn't intended to be so. I definitely don't want to discourage anyone making suggestions / contributions to improve Xarray, and I don't think that any suggestion would obscure the project!

I just wanted to say that in this specific case the addition of an argument like invert or exclude_dims, while it would certainly make some cases a bit more convenient and succinct, is problematic in several ways. Not only regarding the order of the dimensions of a Dataset, but also on how to handle the case where multiple stacked dimensions are given as kwargs to stack. I'm sure we could come up with some basic rules to easily handle those cases (e.g., do not allow multiple stack dim kwargs) but is it worth the trouble? About the ugliness of all other ways of achieving it, I respectfully disagree. While a list comp is slightly more verbose, it is easy to understand. At a first glance the .group_over or xr.ALL_DIMS-based alternatives look also good IMHO.

benbovy avatar Oct 21 '23 13:10 benbovy

@benbovy oh no not at all! Hope you didn't think I had taken offense after reading my response. I did genuinely appreciate your feedback and fully understand your sentiment. I understand your concerns about the invert and exclude_dims options, and your point about the list comprehension is valid as well. I agree that pursuing one of the other alternatives mentioned is probably better. At the end of the day, I just think it would be nice to be able to quickly iterate over rows across one (or more) dimension(s); whether it's in stack or some other method is less important.

carschandler avatar Oct 23 '23 13:10 carschandler

Thanks @carschandler ! Very much appreciate your attitude and approach here!

What do folks think about xr.ALL_DIMS(exclude=['foo', 'bar'])?

To do this, we would make xr.ALL_DIMS into a small class and store exclusions. And then in infix_dims, we'd check for the class and materialize them based on the current dims.

I don't love it — it's not that much cleaner than [d for d in ds.dims if d not in ['foo', 'bar']]. But it's also not intrusive, doesn't require widespread changes or maintenance — I don't see much downside (maybe another item on the Dims type definition; though maybe not since it's Hashable...)

max-sixty avatar Oct 23 '23 19:10 max-sixty

@max-sixty I agree with all your points. Not as concise as it could be in group_over, but I do think that the syntax is still marginally cleaner than the list comprehension, and I think its meaning is more quickly conveyed to the reader. It also provides a canonical way to get all dims except one, and I always appreciate when a project gives me a way to do something; gives me more confidence in my code. On the other hand, it's nice to have pure equality between ... and ALL_DIMS, but I don't think that that's a good enough reason to reject it.

If it is much work, then I would say it's not worth worrying about it and that list comprehension (or even da.to_dataset().to_stacked_array()) would work for now until group_over is ready. If it's a simple change, I think it's a nice convenience.

carschandler avatar Oct 23 '23 19:10 carschandler

One other thought is that when doing chained operations, list comprehensions can become cumbersome if what you want to stack is actually some indexed version of an array. Of course you could assign to another array intermediately, but once again this is clunkier and may not always create a view depending on your indexing.

carschandler avatar Oct 24 '23 18:10 carschandler