xarray
xarray copied to clipboard
Allow no padding for rolling windows
- [x] Fixes #4743
- [x] Tests added
- [x] Passes
pre-commit run --all-files
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
Related to #2007.
Unit Test Results
6 files 6 suites 1h 9m 48s :stopwatch: 17 081 tests 15 238 :heavy_check_mark: 1 735 :zzz: 108 :x: 95 682 runs 86 859 :heavy_check_mark: 8 175 :zzz: 648 :x:
For more details on these failures, see this check.
Results for commit 5a2eadc4.
:recycle: This comment has been updated with latest results.
FWIW, the test failures are all the same error, which should be fixed, but should have nothing to do with this PR.
Thanks @kmsquire I'll try and review this soon.
Hello @kmsquire! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2021-08-11 16:38:29 UTC
@dcherian Thanks. I thought I was done, but I'm finding things that aren't working as expected. I'm working on adding more tests for expected behavior, so marking as WIP. Will ping back here when I'm finished (maybe today, maybe early next week).
@dcherian Okay, I think this is in good shape. I added some more tests, and fixed a few more bugs. Most of the fixes have been squashed back down to the original commit.
I left the second commit separate for now because it's breaking. Previously, iterating over a rolling window returned only returned blocks present in the original array, and ignored the chunk of nan
s before the start or after the end of the array. This meant that the view of the DataArray
or Dataset
that was returned for each iteration was potentially a different size.
Here, instead, the iterator was changed so that each returned view matches the corresponding slice of the output of the construct()
function. To me, this seems more intuitive, and makes it easier to develop algorithms without having to call construct.
Other than the fact that it's breaking, the main drawback (and difference with construct()
) is that the windows themselves are labeled with coordinates along the rolling axis, except for any nan
values before the start or after the end of the actual data, which do not have labels. For comparison, in construct, the window dimension does not have any coordinates associated with it by default.
After writing this, I'm wondering if it might be useful to simply drop the coordinates along the rolling axis, so that the behavior matches the behavior of construct()
even more closely.
I'm open to thoughts/comments/criticisms/suggestions/questions/whatever.
Also, FWIW, the test failure was caused by a problem in zarr/fsspec (https://github.com/intake/filesystem_spec/issues/707), which is fixed in master on fsspec (https://github.com/intake/filesystem_spec/pull/710).
So it should be fixed here whenever fsspec makes a release.
@dcherian Thank you for reviewing. I've started working through your comments.
However, pad has so many arguments that it might become a nightmare to thread them all through. So maybe just sticking with pad=False is ok.
We could also do pad={"mode": "wrap"}
or pad=True, pad_kwargs={...}
I think i prefer .pad(time=5, mode="wrap").rolling(time=5, pad=False)
but it does require specifying the window length twice. I was going to bring this up at our next dev meeting (tomorrow!)
cc @pydata/xarray
The code also introduces a problem, the following no longer works:
xr.DataArray([1, 2, 3], coords=dict(x=[1, 1, 1])).rolling(x=3).mean()
thus you may have to use
isel
instead ofsel
if possible. Therefore I was also unable to test the following (from #2007 (comment)):monthly.pad(month=n_months, mode="wrap").rolling(center=True, month=n_months, pad=False).mean(skipna=False)
Fixed and added a test for this. The example at the bottom now works.
Mentioned in one of the comments above, but I think I've reached about the amount of time that I can spend on this right now. If there are other minor changes, please do let me know. I can also back out the breaking change if desired (although that will probably take some commit surgery).
Thanks @kmsquire we can take it from here.
There was some discussion about syntax at the dev meeting today. There were multiple votes in favour of allowing full control of padding in the rolling object itself. So we could have
-
.rolling(time=5, x=3, pad={"x": {"mode": "wrap"}, "time": False})
, OR -
.rolling(time=5, x=3).pad({"x": {"mode": "wrap"}, "time": False})
@pydata/xarray thoughts?
There was some discussion about syntax at the dev meeting today. There were multiple votes in favour of allowing full control of padding in the rolling object itself. So we could have
.rolling(time=5, x=3, pad={"x": {"mode": "wrap"}, "time": False})
, OR.rolling(time=5, x=3).pad({"x": {"mode": "wrap"}, "time": False})
FWIW, I think 1 would be more efficient (but perhaps harder to implement). With this one, the DataArrayRolling
object is only created once, with all of the required information.
If 2 were implemented, what should the return value of the .pad()
call be? Would it call construct
on the rolling
object and return a DataArray
? Or would it return an updated DataArrayRolling
object? Or would it be a new type entirely (e.g., DataArrayPaddedRolling
)?
Your points about (2) being hard to interpret are a good reason to go with (1)! I was thinking it would return a DatasetRolling object but that's not the only interpretation as you point out.
Hey all, has there been any movement on this PR, or maybe elsewhere? It seems like a nice feature to me.