resamplr
resamplr copied to clipboard
Feature Request: Support resampling of resample objects
I have a data frame where I am running models on well-overlapping subsets of the data frame. Within each subset I am then using time series cross-validation. It would be very inefficient to store separate copies of each subset of the df.
As a reproducible example:
n <- 10
df <- tibble(
x = 1:n,
y = 2*1:n
)
samples <- resample_df(df, map(1:n, ~ setdiff(1:n, .)))
samples
has the well overlapping subsets of the data frame. Then I can run the time series cross-validation on each subset with
samples_crossv <- samples %>%
mutate(sample = map(sample, ~ as.data.frame(.) %>% crossv_ts()))
However, this loses the pointer to the original data frame. You can see this by creating:
samples_dfs <- mutate(samples, sample = map(sample, as.data.frame))
Then compare object_size(samples)
and object_size(samples_dfs)
. My data frame is wide enough and the overlapping between subsets is enough that this would be a very useful feature.
Sorry for the delay, that's completely doable, and should be pretty easy. All the resampling functions have a non-exported versions that take the length of the vector as an input, and return an output. I plan on doing one more big rewrite of this package soon (next week), and then submit to CRAN