imputeTS
imputeTS copied to clipboard
Update na_interpolation.R to allow modifiable interpolation rule for approx
Adding option for users to alter the approximation rule to describe how interpolation outside the interval [min(x), max(x)] should take place.
Hey Ron, thanks for your pull request! Just seen this yesterday, as I was coming back from hiking trip in the mountains.
I definitely see the problem you want to solve, nice find! Also agree, that a fix would be nice.
Problem Assessment:
The current internal code is interp <- stats::approx(indx, data_vec[indx], 1:n, rule = 2, ...)$y
. The rule = 2
is somehow needed because the default behavior should be no NAs after imputations.
Downside is: the user has currently no chance of manually specifying rule = 1
, to keep remaining NAs if no "extrapolation" is wanted. All other additional parameters of approx
can be used via ... , which works quite fine. But since rule
is already used, it won't be passed through to approx.
I think I would prefer some adaptions to this suggestions to increase usability. Here are my thoughts:
The new rule
parameter would only apply to option ="approx"
and option ="linear"
, which might be confusing. Additionally rule
as parameter name, while also used by approx
seems to be rather uninformative.
What do you think about the following two options:
- Instead of a new parameter
rule
add a parameterna_remaining
similar to na_locf
na_remaining | Method to be used for remaining NAs.
- "keep" - to return the series with NAs
- "rm" - to remove remaining NAs
- "mean" - to replace remaining NAs by overall mean
- "rev" - to perform nocb / locf from the existing direction
- Don't set
rule
as new parameter (to prevent to much parameters) but enable passing through to approx (while keeping rule = 2 as default)
In this case we would need a check at the beginning of the function, something like this:
if (rule is in supplied parameters )
xyz <- value of supplied rule parameter
else
xyz <- default value of 2
and later call
approx( ... rule = xyz)
Not quite sure if checking if rule is in supplied parameters (without being a predefined parameter) is reliably possible ... we would have to check this option.
What do you think?
Hey Steffen,
Hiking in the mountains, ey? Awesome--looking forward to getting back into it here in the PNW now that the smoke is clearing, slowly but surely.
Agree that simply setting rule won't be applicable for all desired extrapolations. Either suggestion of yours could work, but because rule could also include more complex parameters (e.g. 1:2), perhaps solution #1 by itself would preclude that flexibility should the user want it? I think the second solution would allow both; but the first solution would allow folks to get most of the way there without the need to understand the underpinnings of approx, so maybe both?
What do you think? Happy to modify my PR to try them out when I get some time.
Thanks for the thorough follow-up!
Ron
On Fri, Sep 18, 2020 at 10:48 AM Steffen Moritz [email protected] wrote:
Hey Ron, thanks for your pull request! Just seen this yesterday, as I was coming back from hiking trip in the mountains.
I definitely see the problem you want to solve, nice find! Also agree, that a fix would be nice.
Problem Assessment: The current internal code is interp <- stats::approx(indx, data_vec[indx], 1:n, rule = 2, ...)$y. The rule = 2 is somehow needed because the default behavior should be no NAs after imputations.
Downside is: the user has currently no chance of manually specifying rule = 1, to keep remaining NAs if no "extrapolation" is wanted. All other additional parameters of approx can be used via ... , which works quite fine. But since rule is already used, it won't be passed through to approx.
I think I would prefer some adaptions to this suggestions to increase usability. Here are my thoughts: The new rule parameter would only apply to option ="approx" and option ="linear", which might be confusing. Additionally rule as parameter name, while also used by approx seems to be rather uninformative.
What do you think about the following two options:
- Instead of a new parameter rule add a parameter na_remaining similar to na_locf
na_remaining | Method to be used for remaining NAs.
- "keep" - to return the series with NAs
- "rm" - to remove remaining NAs
- "mean" - to replace remaining NAs by overall mean
- "rev" - to perform nocb / locf from the existing direction
- Don't set rule as new parameter (to prevent to much parameters) but enable passing through to approx (while keeping rule = 2 as default)
In this case we would need a check at the beginning of the function, something like this:
if (rule is in supplied parameters ) xyz <- value of supplied rule parameter else xyz <- default value of 2
and later call approx( ... rule = xyz)
Not quite sure if checking if rule is in supplied parameters (without being a predefined parameter) is reliably possible ... we would have to check this option.
What do you think?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SteffenMoritz/imputeTS/pull/47#issuecomment-695001580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2BGZP7LIMURBKRWUBRU7DSGOMPJANCNFSM4RCGCLKA .
Hey Ron, this would be really great, if you can modify the PR.
I thought a day about it and I am slightly in favor of suggestion 2. Like you said, it keeps the flexibility for the user. (but if you prefer suggestion 1 I also would not mind if you change it to this)
I quickly googled (https://stackoverflow.com/questions/9877271/how-to-check-existence-of-an-input-argument-for-r-functions) and the hasArg
solution proposed in the answers looked quite promising to check if an argument is given in the function call. For the one example I tried it worked ; ) So you probably will have to try and play around a little to check if it really works in the package setting xD
Would be nice, if you also could add one or two tests of the new feature to the unit tests. (tests/testhat/test-na_interpolation.R) to make sure it does what it is supposed to do. Maybe also one line in the NEWS.md file about the update.
And of course: Don't forget to add yourself in the DESCRIPTION file as a contributor!
I am really happy to have people like you contributing to the package :)
Just to mention: No hurries - no worries if the update takes longer.
Yessir, it's on my to do list. Just been inundated with my day job but should have time soon. :)
On Thu, Oct 22, 2020 at 3:11 AM Steffen Moritz [email protected] wrote:
Just to mention: No hurries - no worries if the update takes longer.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SteffenMoritz/imputeTS/pull/47#issuecomment-714387570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2BGZMXXZSUMD2QQA2B5XDSMAAORANCNFSM4RCGCLKA .
Finally got around to refactoring this a bit @SteffenMoritz while on paternity leave if it's still of interest to incorporate. :)
Yes, definitely still of interest. Happy to hear from you! I'll check this during the next week.
Hey Ron, @ronaldhause, sorry this took me so long to check. I've been planning to submit a new version (including the pull request) to CRAN since ages, but somehow I've constantly been busy with other things.
Quite elegant solution - I really like it! Checked it and it seems to work fine. Also thanks for directly adding unit tests. Thanks a lot I really appreciate it.
I will submit the new version (including your pull request) and plenty other changes finally today or tomorrow.