odc-tools
odc-tools copied to clipboard
Possible improvements to new xr_percentile function
Hey @kieranricardo and @Kirill888, I've just had a chance to test out the new xr_percentile
function. It seems to function far faster than the built-in .quantile
functionality in xarray
, which is really exciting! I did have a few pieces of feedback though which might help it be easier to use and to fit into existing workflows.
Issue 1: The function currently returns an xr.Dataset
with a data variable for each percentile the user requests. E.g. below, I've requested a 0.01
and a 0.999
percentile. These are used to label the new variables by appending the percentiles onto the original variable name.
This approach feels a bit clunky to to me as the user can't anticipate the naming convention used for the new band names, especially as the input 0.01
value is converted to an integer 1
. It also produces issues where the user requests higher than 0.01 precision, e.g. the 0.999
above is clipped to 0.99
in the band name (e.g. if a user requests both 0.995
and 0.999
, they get combined into one in the output).
The native xarray
solution is to instead produce an xr.Dataset
with a new "quantile" dimension, which is then labelled with the requested quantiles. This feels a bit more elegant to me:
Issue 2: It would be nice if this function also supported non-dask input data too, as for a lot of smaller-scale science team applications we don't always need to use dask throughout an entire workflow. If I try running it on a xr.Dataset
in memory, I get this error:
Agree on both counts.
This sounds great given quantile doesn't work on dask I think?
and a few hundred times faster than this would be great:- 3 variables, 44 bands, 100 million pixels each approx