caret
caret copied to clipboard
Feature: allow burn-in in createTimeSlices
Thanks for making one of the best R packages ever!
I'd like to suggest a minor feature for the function createTimeSlices
inside https://github.com/topepo/caret/blob/master/pkg/caret/R/createDataPartition.R
There are some validation test statistics whose proofs require the train and test sample to be separated by a small burn-in sample to avoid dependence between the two samples (mostly to address residual dependence when the model is not correctly specified). For instance Proposition 3 Chapter 4 in Andree, B. P. J. (2020). Theory and Application of Dynamic Spatial Time Series Models. Rozenberg Publishers and the Tinbergen Institute, propose a Diebold Mariano statistic that tests the significance of Log Likelihood differences on a validation sample with a small burn-in.
Below is a simple modification of the time slices function that would make such things easier to execute.
createTimeSlices <- function (y, initialWindow, horizon = 1, fixedWindow = TRUE,
skip = 0, burnin = 0)
{
stops <- seq(initialWindow, (length(y) - horizon), by = skip +
1)
if (fixedWindow) {
starts <- stops - initialWindow + 1
}
else {
starts <- rep(1, length(stops))
}
stopifnot(burnin < horizon)
train <- mapply(seq, starts, stops, SIMPLIFY = FALSE)
test <- mapply(seq, stops + 1 + burnin, stops + horizon, SIMPLIFY = FALSE)
nums <- gsub(" ", "0", format(stops))
names(train) <- paste("Training", nums, sep = "")
names(test) <- paste("Testing", nums, sep = "")
out <- list(train = train, test = test)
out
}
Here I'm using it with a single observation as burn-in:
> createTimeSlices(1:10, 5, 3, TRUE, 0, 1)
$train
$train$Training5
[1] 1 2 3 4 5
$train$Training6
[1] 2 3 4 5 6
$train$Training7
[1] 3 4 5 6 7
$test
$test$Testing5
[1] 7 8
$test$Testing6
[1] 8 9
$test$Testing7
[1] 9 10
I added a simple error message when the burn-in sample leads to discarding the entire validation sample:
> createTimeSlices(1:10, 5, 3, TRUE, 0, 10)
Error in createTimeSlices(1:10, 5, 3, TRUE, 0, 10) :
burnin < horizon is not TRUE
Kind regards, Bo