xarray
xarray copied to clipboard
Cumulative examples
Here I am trying to add docstring example to the cumulative functions. I did this by basically copying the method used for the reduction functions. Not sure at all if I did this correctly so i'm marking it a draft
- [ ] Closes #xxxx
- [ ] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
- [ ] New functions/methods are listed in
api.rst
Very nice, this is something that's been on the TODO list! :)
I believe we wanted to rename generate_reductions.py
to generate_aggregations.py
so cumsum et al could be included and generated there as well. Is there a lot of work for you if try to merge these into that one?
@Illviljan That is definitely something I could do. Are there any other methods I should be including in this?
Right now, I think cumsum
and cumprod
is enough. numpy-groupies has a few more examples that I suppose we could support in the future.
Great I'll start working on that. Shouldn't take too long
Thanks @patrick-naylor !
Instead of using Dataset.reduce
I think we want something like
def cumsum(..., dim):
return xr.apply_ufunc(
np.cumsum if skipna else np.nancumsum,
obj,
input_core_dims=[dim],
output_core_dims=[dim],
kwargs={"axis": -1},
)
# now transpose dimensions back to input order
to fix #6528.
At the moment, this should also work on GroupBy objects quite nicely.
Thanks @dcherian, I'll try to work that in. Is there a particular reason why there is no cumprod for GroupBy objects?
Is there a particular reason why there is no cumprod for GroupBy objects?
Nope. Just wasn't added in :)
def cumsum(..., dim): return xr.apply_ufunc( np.cumsum if skipna else np.nancumsum, obj, input_core_dims=[dim], output_core_dims=[dim], kwargs={"axis": -1}, ) # now transpose dimensions back to input order
I'm running into an issue with variables without the core dimensions. Would it be better to do a work around in cumsum or in apply_unfunc like you mentioned in #6391
I'm running into an issue with variables without the core dimensions. Would it be better to do a work around in cumsum or in apply_unfunc like you mentioned in #6391
While this is definitely worth an improvement, it is quite out of scope for this PR. Can you define Datasets such that this problem does not occur?
I've merged the cumulative and reduction files into generate_aggregations.py and _aggregations.py. This uses the original version of reductions with an additional statement on the dataset methods that adds the original coordinates back in.
Using apply_ufunc and np.cumsum/cumprod has some issues as it only finds the cumulative across one axis which makes iterating through each dimension necessary. This makes it slower than the original functions and also causes some problems with the groupby method.
Happy for any input on how the method using apply_ufunc might be usable or on any ways to change the current method.
I'm getting a few issues I don't quite understand:
- When running pytest on my local repository I get no errors but it's failing the checks here with a NotImplementedError
- Black is having an issue with some of the strings in generate_aggregations. It's saying it cannot parse what should be valid code.
Thanks!
I don't think you have flox
installed, if it's not installed the code will take the old path.
Do conda install flox
and I think you'll get the NotImplementedError. Then you maybe have to change the default settings in cumsum so flox is not used.
Thanks for taking this on @patrick-naylor ! This is a decent-sized project!
Using apply_ufunc and np.cumsum/cumprod has some issues as it only finds the cumulative across one axis which makes iterating through each dimension necessary.
np.cumsum
only supports an integer axis so this is OK?
flox
doesn't support cumsum
at the moment (https://github.com/xarray-contrib/flox/issues/91) so we can delete that bit and just have one code path.
@dcherian Do you think it would be better to finish this PR as the creation of _aggregations.py to give the cum methods better documentation? Then start a new one to fix #6528?
o you think it would be better to finish this PR as the creation of _aggregations.py to give the cum methods better documentation? Then start a new one to fix https://github.com/pydata/xarray/issues/6528?
Sure that would be a good intermediate step. Let us know if you need help.
@keewis do you understand this blackdoc error?
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...............................................................Passed
debug statements (python)................................................Passed
mixed line ending........................................................Passed
autoflake................................................................Passed
isort....................................................................Passed
pyupgrade................................................................Passed
black....................................................................Passed
black-jupyter............................................................Passed
blackdoc.................................................................Failed
- hook id: blackdoc
- exit code: 123
error: cannot format /code/xarray/util/generate_aggregations.py: Cannot parse: 201:27: EOF in multi-line string
Oh no! 💥 💔 💥
215 files left unchanged, 1 file fails to reformat.
@dcherian cumsum for resample fails for some reason do you have any ideas?
import numpy as np
import pandas as pd
import xarray as xr
da = xr.DataArray(
np.array([1, 2, 3, 1, 2, np.nan]),
dims="time",
coords=dict(
time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
),
)
ds = xr.Dataset(dict(da=da))
a = ds.resample(time="3M")
a.cumsum()
Traceback (most recent call last):
File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "c:\users\j.w\documents\github\xarray\xarray\util\untitled2.py", line 20, in <module>
a.cumsum()
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\_aggregations.py", line 4921, in cumsum
return self.reduce(
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\resample.py", line 395, in reduce
return super().reduce(
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\groupby.py", line 1357, in reduce
return self.map(reduce_dataset)
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\resample.py", line 342, in map
return combined.rename({self._resample_dim: self._dim})
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\dataset.py", line 3646, in rename
return self._rename(name_dict=name_dict, **names)
File "C:\Users\J.W\Documents\GitHub\xarray\xarray\core\dataset.py", line 3587, in _rename
raise ValueError(
ValueError: cannot rename '__resample_dim__' because it is not a variable or dimension in this dataset
do you understand this blackdoc error?
so the reason is the position of the triple quotes: with
def f():
"""
>>> 1 + 2"""
pass
the extracted line will become: 1 + 2"""
, which when handed to black
results in that error.
Technically, that's a bug (the triple quotes mark the end of the docstring), but one that is a bit tricky to fix: blackdoc
is implemented using a line parser, which does not work too well if the transition happen somewhere within the line.
My guess is that it would have to start counting quotes which I've tried to avoid up until now since there's a lot of details to get right (see also keewis/blackdoc#145)
Edit: for now, I guess it would be fine to add something like The closing quotes are not on their own line.
to the error message
@keewis What does the numbers mean in Cannot parse: 201:27
? Because I can't find any functions with docstrings in this file, it's just a bunch of multiline strings that are defined to variables and in my mind shouldn't trigger blackdoc.
well, that's just it: there's no way to discriminate between docstrings and multi-line triple-quoted strings without parsing the python code (which is definitely out of scope), so blackdoc
doesn't even attempt to. Instead, any line that begins with >>>
will be assumed to be doctest lines. So with that, the numbers mean: line 201, character 27.
When created that file we decided to skip the generate_*
file, but because the file was renamed that rule does not work anymore. Could you update the pre-commit
configuration?
@patrick-naylor, feel free to try out a better default example if you want.
Thanks @Illviljan, @dcherian, and @keewis so much for the help.