xts
xts copied to clipboard
na.fill long call times
na.fill()
takes much longer than other methods to fill NA values. I expected near constant time to find and fill NA values.
Reproducible timings are below. I will send testData
via email independently as it is too large for this forum.
testData[is.na(testData)] = 0
Unit: milliseconds
expr min lq mean median uq max neval
testData[is.na(testData)] 26.52425 35.31587 37.19453 37.23217 38.80555 46.61541 10
> microbenchmark::microbenchmark(na.fill(testData,0),times = 1)
Unit: seconds
expr min lq mean median uq max neval
na.fill(testData, 0) 150.9054 150.9054 150.9054 150.9054 150.9054 150.9054 1
Unit: seconds
expr min lq mean median uq max neval
na.fill(coredata(testData), 0) 52.90975 52.90975 52.90975 52.90975 52.90975 52.90975 1
Session Info
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] roll_1.0.7 PerformanceAnalytics_1.5.2 xts_0.10-2.2 zoo_1.8-2 TTR_0.23-3 dygraphs_1.1.1.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 magrittr_1.5 xtable_1.8-2 lattice_0.20-35 R6_2.2.2 quadprog_1.5-5 xlsx_0.6.1 tools_3.5.1 grid_3.5.1 data.table_1.11.4 qmao_1.6.11
[12] R.oo_1.22.0 htmltools_0.3.6 yaml_2.1.19 RcppParallel_4.4.0 digest_0.6.15 RJSONIO_1.3-0 shiny_1.0.5 rJava_0.9-10 later_0.7.3 microbenchmark_1.4-4 promises_1.0.1
[23] bitops_1.0-6 htmlwidgets_0.9 R.utils_2.6.0 xlsxjars_0.6.1 RCurl_1.95-4.10 curl_3.2 mime_0.5 pander_0.6.2 compiler_3.5.1 R.methodsS3_1.7.1 jsonlite_1.5
[34] httpuv_1.4.4.2
Note that there is no na.fill.xts()
method, so zoo:::na.fill.zoo()
is dispatched. We can probably add an xts method, and only dispatch to the zoo method if fill
is not a scalar.
That would be consistent, just curious why the zoo method is so much slower than simply specifying the condition in brackets.
On Wed, Jul 25, 2018 at 1:43 PM, Joshua Ulrich [email protected] wrote:
Note that there is no na.fill.xts() method, so zoo:::na.fill.zoo() is dispatched. We can probably add an xts method, and only dispatch to the zoo method if fill is not a scalar.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joshuaulrich/xts/issues/259#issuecomment-407855781, or mute the thread https://github.com/notifications/unsubscribe-auth/AHKwqnDtyu2BaFf5gx1klzac3DjMVSF3ks5uKLxXgaJpZM4Vgcoi .
@ckatsulis it looks like a lot of the time is spent in creating symbol names from the ...
arguments passed to merge.xts()
. This is related to #248.
Hi @ckatsulis, the fix for #248 improves performance for this case fairly dramatically. Though it's still not nearly as fast as x[is.na(x)] <- 0
. Is the new performance sufficient to close this issue?
x <- .xts(1:1e7, 1:1e7)
is.na(x) <- sample(1e7, 1e6)
microbenchmark::microbenchmark(na.fill(x, 0), times = 1)
# Unit: seconds
# expr min lq mean median uq max neval
# na.fill(x, 0) 1.944266 1.944266 1.944266 1.944266 1.944266 1.944266 1
y <- coredata(x)
microbenchmark::microbenchmark(na.fill(y, 0), times = 1)
# Unit: seconds
# expr min lq mean median uq max neval
# na.fill(y, 0) 6.566733 6.566733 6.566733 6.566733 6.566733 6.566733 1
microbenchmark::microbenchmark(X[is.na(X)] <- 0, times = 1, setup = X <- x)
# Unit: milliseconds
# expr min lq mean median uq max neval
# X[is.na(X)] <- 0 176.9659 176.9659 176.9659 176.9659 176.9659 176.9659 1
Performance using Rcpp ifelse()
library(xts)
Rcpp::cppFunction("
NumericMatrix na_fill(const NumericMatrix x, const double fill) {
NumericMatrix r = clone(x);
for (int j=0; j < r.ncol(); j++) {
r(_,j) = ifelse( is_na(r(_,j)), fill, r(_,j) );
}
return r;
}")
f1 = function(x, fill) {
x[is.na(x)] <- fill
x
}
x <- .xts(1:1e7, 1:1e7)
is.na(x) <- sample(1e7, 1e6)
microbenchmark::microbenchmark(na.fill(x, 0),
na_fill(x, 0),
f1(x, 0),
times=100)
Unit: milliseconds
expr min lq mean median uq max neval
na.fill(x, 0) 1082.35488 1146.39916 1178.56886 1175.79914 1203.5146 1309.6379 100
na_fill(x, 0) 71.08533 76.49017 97.88448 89.81974 110.8743 166.9555 100
f1(x, 0) 69.06708 78.01604 98.97025 93.17469 105.8702 177.6743 100
Two orders of magnitude better is more than enough for me. Is na_fill exported in the new xts?
On Wed, Dec 12, 2018 at 11:29 AM harvey131 [email protected] wrote:
Performance using Rcpp ifelse()
library(xts) Rcpp::cppFunction(" NumericMatrix na_fill(const NumericMatrix x, const double fill) { NumericMatrix r = clone(x); for (int j=0; j < r.ncol(); j++) { r(,j) = ifelse( is_na(r(,j)), fill, r(_,j) ); } return r; }") f1 = function(x, fill) { x[is.na(x)] <- fill x } x <- .xts(1:1e7, 1:1e7)is.na(x) <- sample(1e7, 1e6) microbenchmark::microbenchmark(na.fill(x, 0), na_fill(x, 0), f1(x, 0), times=100) Unit: milliseconds expr min lq mean median uq max neval na.fill(x, 0) 1082.35488 1146.39916 1178.56886 1175.79914 1203.5146 1309.6379 100 na_fill(x, 0) 71.08533 76.49017 97.88448 89.81974 110.8743 166.9555 100 f1(x, 0) 69.06708 78.01604 98.97025 93.17469 105.8702 177.6743 100
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joshuaulrich/xts/issues/259#issuecomment-446673018, or mute the thread https://github.com/notifications/unsubscribe-auth/AHKwqp_LxDNh7N6dHScaIaDp8N7vtM8-ks5u4T0MgaJpZM4Vgcoi .
@ckatsulis the code @harvey131 posted isn't in xts, let alone exported. I'm not likely to add a dependency on compiled code unless there's significant performance improvement over a native R solution. In this case, the Rcpp code is roughly the same as x[is.na(x)] <- 0
.
I was also hoping to make the change upstream in zoo, but time is a constraint, as always.
Agreed. No need if the existing call is the same speed for all intents and purposes.
Chris Sent from my iPhone
On Dec 13, 2018, at 07:23, Joshua Ulrich [email protected] wrote:
@ckatsulis the code @harvey131 posted isn't in xts, let alone exported. I'm not likely to add a dependency on compiled code unless there's significant performance improvement over a native R solution. In this case, the Rcpp code is roughly the same as x[is.na(x)] <- 0.
I was also hoping to make the change upstream in zoo, but time is a constraint, as always.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Idea: zoo::na.fill0()
is significantly faster in this case. Consider updating zoo:::na.fill.zoo()
to call na.fill0()
in this type of special case, or create a na.fill.xts()
method that does it. I'd prefer to keep it all in zoo, so it's in one place.