TTR
TTR copied to clipboard
rolling trimmed mean
Description
new functionality supporting rolling trimmed means
when trim * n is not an integer, there is special handling needed for the boundaries:
https://stats.stackexchange.com/questions/4252/how-to-calculate-the-truncated-or-trimmed-mean
very hokey rough implementation.
runTrimmedMean <- function(x, n, cumulative = FALSE, trim) {
if ((trim > 0.5) || (trim < 0)) stop("trim must be between 0,.5")
x <- try.xts(x, error = as.matrix)
if (ncol(x) > 1) stop("only supports univariate x")
if (nrow(x) < n) {
is.na(x) <- T
return(x)
}
v <- as.vector(x)
if (cumulative) {
r <- sapply(seq_len(nrow(x)), FUN = \(n) {mean(v[1:n], na.rm = T, trim = trim)})
} else {
#data.table::frollapply slightly faster than zoo::rollapply
r <- data.table::frollapply(v, n = n, FUN = \(vals) {mean(vals, na.rm = T, trim = trim)})
}
return(reclass(r, x))
}
also, not sure if this should be a standalone function in R, or just add a trim parameter to runMean, like how base R does it
I think that one way to think about this is that there are just variants of a single function, with runMean being an optimization for runTrimmedMean(trim = 0) and runMedian an optimization for runTrimmedMean(trim = .5)
also, not sure if this should be a standalone function in R, or just add a
trimparameter torunMean, like how base R does it
If I may join the discussion; I think having it as a trim-parameter is a good idea - then it could applied on all rolling means there is in TTR.
it seems to me that the place to put this would be in runMean, since it already exists, not as a new function
trim in mean.default is defined as:
if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (anyNA(x))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
so your prototype functionality would deliver the median of the data for any value of trim>0.05
agree this should ideally just extend runMean util i get time to work on a proper C pr, here is a usable R implementation:
runTrimmedMean <- function(x, n, cumulative = FALSE, trim = 0) {
if (trim <= 0) return(runMean(x, n = n, cumulative = cumulative))
if (trim >= 0.5) return(runMedian(x, n = n, cumulative = cumulative))
x <- try.xts(x, error = as.matrix)
if (NCOL(x) > 1) stop("ncol(x) > 1. runSum only supports univariate 'x'")
len.x <- nrow(x)
r <- rep(NA_real_, len.x)
v <- as.vector(x)
start <- max(last(which(is.na(v))) + 1, 1)
if (len.x >= (start + n)) {
trim.start <- floor(trim * n) + 1
trimmed.range <- trim.start:(n + 1 - trim.start)
i <- start + n - 1
buff <- sort(v[start:i])
r[i] <- sum(buff[trimmed.range]) / length(trimmed.range)
while(i < len.x) {
i <- i+1
if (cumulative) {
#expand buffer
n <- n + 1
trim.start <- floor(trim * n) + 1
trimmed.range <- trim.start:(n + 1 - trim.start)
} else {
#remove old value from buffer. will always get found by match, so no need for error handling
buff <- buff[-match(v[i-n], buff)]
}
#add new value to buffer. match + insertion is faster than resorting whole buffer
after <- match(TRUE, v[i] <= buff, nomatch = length(buff) + 1) - 1
buff <- append(buff, v[i], after = after) # < 1 will pre-pend
r[i] <- sum(buff[trimmed.range]) / length(trimmed.range) #na's already handled, so bypass checks in mean
}
}
return(reclass(r, x))
}
Great start! Would it make sense to have unit tests of this function versus something like rollapply that uses mean with a trim value?
yes. thats exactly how i tested this version. ill formalize them, as i need to that anyways
Awesome. You can use 'tinytest' (it's a near drop-in replacement for 'testthat') instead of 'RUnit'. I intend to convert all the TTR tests to 'tinytest' at some point.