caretEnsemble
caretEnsemble copied to clipboard
2.0.3 vs 4.0+ stacking results differ
Probably a general and naive question, but I recently used caretEnsemble (both caretList and caretStack functions) for a classification analysis with the new version 4.0.1, but when comparing the results obtained with the same script with version 2.0.3, I noticed a dramatic decrease in classification performance. How can I reproduce the same result with the newer version of caretEnsemble?
I changed the defaults for the trainControl in 4.0. Instead of using 25 fold bootstrap sampling (the caret default), I'm using 5 or 10 fold CV.
It's a little convoluted, but you can install the old release, run your code, manually inspect the trainControl you get in the old model, and then install the new release and manually set that train control.
In general, its a good idea to make sure you're explicitly setting the train control, both for caretStack and for caretList. That will help guarantee it stays the same over time.
Here's an example:
# Fit the old model
devtools::install_github("zachmayer/[email protected]")
set.seed(42)
models_2.0.3 <- caretEnsemble::caretList(
Sepal.Length ~ .,
data = iris,
methodList = c("glm", "rpart")
)
print(sapply(models_2.0.3, function(x) min(x$results$RMSE)))
# glm rpart
# 0.3134526 0.4315253
# Save trainControl to a file
train_control <- models_2.0.3[[1]]$control
saveRDS(train_control, file = "train_control.rds")
# Fit the new model (need to start a new session for this)
devtools::install_github("zachmayer/[email protected]")
loaded_train_control <- readRDS("train_control.rds")
set.seed(42)
models_4.0.0 <- caretEnsemble::caretList(
Sepal.Length ~ .,
data = iris,
trControl = loaded_train_control,
methodList = c("glm", "rpart")
)
print(sapply(models_4.0.0, function(x) min(x$results$RMSE)))
# glm rpart
# 0.3134526 0.4315253
First of all, thank you for your prompt reply! As you suggested, I had explicitly set the train control with method='repeatedcv', number=10, repeats=10, for both caretList and caretStack, but the classification performance with the 4.0.1 version, as I mentioned, decreased a lot compared to the 2.0.3 version. Now, when I try to run the above example on my data, I get an error: Bad seeds: and I'm trying to figure out why...
trainControl sets random seeds for each csv fold. make sure your new code and old code have the same number of folds.
get caretList working first, then work on caretStack!
Yes, this exactly what I did but the two versions still return different results...I will try to work around again...
Good luck!
Dear Zack, I don't want to bore you again or be inappropriate, but I wanted to resubmit the problem with the reproducibility of the results obtained using the two versions of caretEnsemble, v2.0.3 versus v4.0.1.
I reused the example data from above, but changed the resampling method and added more classifiers for a regression problem. Again, running the same code on the two versions of the software produced very different results.
Below is the code used for both the software versions:
set.seed(123)
models <- caretList(
x=iris[1:50,1:2],
y=iris[1:50,3],
trControl=trainControl(method="repeatedcv",
number=10,
repeats=10,
savePredictions="final"),
methodList=c('svmRadial', 'rf', 'pls', 'glm')
)
caretStack(models,
method="rf",
trControl=trainControl(method="repeatedcv",
number=10,
repeats=10,
savePredictions="final"))
## Printed Results obtained with v4.0.1 ##
# The following models were ensembled: svmRadial, rf, pls, glm
#
# caret::train model:
# Random Forest
#
# 50 samples
# 4 predictor
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times)
# Summary of sample sizes: 43, 46, 45, 45, 46, 45, ...
# Resampling results across tuning parameters:
#
# mtry RMSE Rsquared MAE
# 2 0.1663337 0.2667371 0.1318132
# 3 0.1666885 0.2725491 0.1319602
# 4 0.1675535 0.2799366 0.1332284
#
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.
#
# Final model:
#
# Call:
# randomForest(x = x, y = y, mtry = param$mtry)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.03072593
# % Var explained: -3.96
################################################################################
## Printed results obtained with v2.0.3 ##
# A rf ensemble of 4 base models: svmRadial, rf, pls, glm
#
# Ensemble results:
# Random Forest
#
# 500 samples
# 4 predictor
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times)
# Summary of sample sizes: 450, 450, 450, 450, 450, 450, ...
# Resampling results across tuning parameters:
#
# mtry RMSE Rsquared MAE
# 2 0.1106396 0.5841740 0.07552396
# 3 0.1106041 0.5831640 0.07403516
# 4 0.1109225 0.5809271 0.07317610
#
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 3.
At first glance it seems that the number of samples used for resampling is different, i.e. 50 for 4.0.1 and 500 for 2.0.3. Doesn't 4.0.1 take into account the number of repeats (repeats=10)?
The problem for me now is to find out whether the results obtained with version 2.0.3, which seem to be getting better and better, are reliable and whether there is a way of reproducing them exactly with the new version of the software.
Is there any other suggestion I can follow to solve this problem? Thank you in advance.
What happens with repeats=1?
On Mon, Oct 14, 2024 at 6:28 AM BioinfoMonzino @.***> wrote:
Dear Zack, I don't want to bore you again or be inappropriate, but I wanted to resubmit the problem with the reproducibility of the results obtained using the two versions of caretEnsemble, v2.0.3 versus v4.0.1.
I reused the example data from above, but changed the resampling method and added more classifiers for a regression problem. Again, running the same code on the two versions of the software produced very different results.
Below is the code used for both the software versions:
set.seed(123) models <- caretList( x=iris[1:50,1:2], y=iris[1:50,3], trControl=trainControl(method="repeatedcv", number=10, repeats=10, savePredictions="final"), methodList=c('svmRadial', 'rf', 'pls', 'glm') ) caretStack(models, method="rf", trControl=trainControl(method="repeatedcv", number=10, repeats=10, savePredictions="final"))
Printed Results obtained with v4.0.1
The following models were ensembled: svmRadial, rf, pls, glm
caret::train model:
Random Forest
50 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 43, 46, 45, 45, 46, 45, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.1663337 0.2667371 0.1318132
3 0.1666885 0.2725491 0.1319602
4 0.1675535 0.2799366 0.1332284
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.
Final model:
Call:
randomForest(x = x, y = y, mtry = param$mtry)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 2
Mean of squared residuals: 0.03072593
% Var explained: -3.96
################################################################################
Printed results obtained with v2.0.3
A rf ensemble of 4 base models: svmRadial, rf, pls, glm
Ensemble results:
Random Forest
500 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 450, 450, 450, 450, 450, 450, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.1106396 0.5841740 0.07552396
3 0.1106041 0.5831640 0.07403516
4 0.1109225 0.5809271 0.07317610
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 3.
At first glance it seems that the number of samples used for resampling is different, i.e. 50 for 4.0.1 and 500 for 2.0.3. Doesn't 4.0.1 take into account the number of repeats (repeats=10)?
The problem for me now is to find out whether the results obtained with version 2.0.3, which seem to be getting better and better, are reliable and whether there is a way of reproducing them exactly with the new version of the software.
Is there any other suggestion I can follow to solve this problem? Thank you in advance.
— Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/340#issuecomment-2410749296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEN7VQA5BNQ4UNO4Y7ZYYLZ3OMFTAVCNFSM6AAAAABPYZVC4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJQG42DSMRZGY . You are receiving this because you commented.Message ID: @.***>
Here is the results with repeats=1
# ##### Version 4.0.1
#
# The following models were ensembled: svmRadial, rf, pls, glm
#
# caret::train model:
# Random Forest
#
# 50 samples
# 4 predictor
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 45, 45, 45, 44, 46, 45, ...
# Resampling results across tuning parameters:
#
# mtry RMSE Rsquared MAE
# 2 0.1685696 0.2384783 0.1363316
# 3 0.1705608 0.2168169 0.1388907
# 4 0.1720041 0.1626698 0.1401095
#
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.
#
# Final model:
#
# Call:
# randomForest(x = x, y = y, mtry = param$mtry)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.03081381
# % Var explained: -4.26
#
#
#
# ##### Version 2.0.3
#
# A rf ensemble of 4 base models: svmRadial, rf, pls, glm
#
# Ensemble results:
# Random Forest
#
# 50 samples
# 4 predictor
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 44, 44, 46, 45, 45, 46, ...
# Resampling results across tuning parameters:
#
# mtry RMSE Rsquared MAE
# 2 0.1850652 0.3609027 0.1423471
# 3 0.1881701 0.3832344 0.1456612
# 4 0.1876367 0.3819812 0.1448195
#
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.
Can you simplify your code more? If you remove all but one model in the list, do the results still differ?
If you remove the stack and just look at the list, do the results differ?
Etc
On Mon, Oct 14, 2024 at 9:00 AM BioinfoMonzino @.***> wrote:
Here is the results with repeats=1
##### Version 4.0.1
The following models were ensembled: svmRadial, rf, pls, glm
caret::train model:
Random Forest
50 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 1 times)
Summary of sample sizes: 45, 45, 45, 44, 46, 45, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.1685696 0.2384783 0.1363316
3 0.1705608 0.2168169 0.1388907
4 0.1720041 0.1626698 0.1401095
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.
Final model:
Call:
randomForest(x = x, y = y, mtry = param$mtry)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 2
Mean of squared residuals: 0.03081381
% Var explained: -4.26
##### Version 2.0.3
A rf ensemble of 4 base models: svmRadial, rf, pls, glm
Ensemble results:
Random Forest
50 samples
4 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 1 times)
Summary of sample sizes: 44, 44, 46, 45, 45, 46, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.1850652 0.3609027 0.1423471
3 0.1881701 0.3832344 0.1456612
4 0.1876367 0.3819812 0.1448195
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.
— Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/340#issuecomment-2411166985, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEN7VUB7CMURCX6PU62LJLZ3O55XAVCNFSM6AAAAABPYZVC4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRGE3DMOJYGU . You are receiving this because you commented.Message ID: @.***>
here a more simplified code:
set.seed(123)
models <- caretList(
x=iris[1:50,1:2],
y=iris[1:50,3],
trControl=trainControl(method="repeatedcv",
number=10,
repeats=1,
savePredictions="final"),
methodList=c('glm')
)
models$glm
### Version 4.0.1
# Generalized Linear Model
#
# No pre-processing
# Resampling results:
#
# RMSE Rsquared MAE
# 0.1698624 0.3091359 0.1315746
models$glm$resample
# RMSE Rsquared MAE Resample
# 1 0.25038795 0.758396285 0.19940477 Fold01.Rep1
# 2 0.26133110 0.193946172 0.21313059 Fold02.Rep1
# 3 0.13418496 0.456645047 0.11436877 Fold03.Rep1
# 4 0.18777545 0.561684343 0.14561714 Fold04.Rep1
# 5 0.20148024 0.054303271 0.13901066 Fold05.Rep1
# 6 0.07797887 0.171861843 0.05607183 Fold06.Rep1
# 7 0.07728170 0.611740910 0.06159649 Fold07.Rep1
# 8 0.22979578 0.009091235 0.15238127 Fold08.Rep1
# 9 0.14529244 0.121730934 0.12106501 Fold09.Rep1
# 10 0.13311569 0.151958817 0.11309911 Fold10.Rep1
### Version 2.0.3
# Generalized Linear Model
#
# 50 samples
# 2 predictor
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 45, 45, 45, 45, 45, 46, ...
# Resampling results:
#
# RMSE Rsquared MAE
# 0.1644285 0.257963 0.1264851
models$glm$resample
# RMSE Rsquared MAE Resample
# 1 0.16189444 0.584288689 0.13352029 Fold01.Rep1
# 2 0.24314587 0.005362388 0.17808495 Fold02.Rep1
# 3 0.08704300 0.102801366 0.06901606 Fold03.Rep1
# 4 0.09737736 0.129642566 0.08078300 Fold04.Rep1
# 5 0.21961301 0.194360262 0.15093683 Fold05.Rep1
# 6 0.19606388 0.198342342 0.17020265 Fold06.Rep1
# 7 0.13870038 0.307985868 0.11587159 Fold07.Rep1
# 8 0.09314910 0.746812486 0.07969144 Fold08.Rep1
# 9 0.21929397 0.245055016 0.15167377 Fold09.Rep1
# 10 0.18800362 0.064979455 0.13507018 Fold10.Rep1
So the RMSE is comparable between the two, although the result is still different and even more so if you look at each fold. ...I will try with other models...
Below I've changed the model and the resampling method:
set.seed(123)
models <- caretList(
x=iris[1:50,1:2],
y=iris[1:50,3],
trControl=trainControl(method="cv",
number=5,
savePredictions="final"),
methodList=c('pls')
)
models$pls
models$pls$resample
### Version 4.0.1
# > models$pls
# Partial Least Squares
#
# No pre-processing
# Resampling results:
#
# RMSE Rsquared MAE
# 0.1757146 0.1515 0.1345077
#
# Tuning parameter 'ncomp' was held constant at a value of 1
# >
# > models$pls$resample
# RMSE Rsquared MAE Resample
# 1 0.1118829 0.085753622 0.08732348 Fold1
# 2 0.1925753 0.142191146 0.14345631 Fold2
# 3 0.1317513 0.007128456 0.11463952 Fold3
# 4 0.2211226 0.498756038 0.17546404 Fold4
# 5 0.2212410 0.023670539 0.15165538 Fold5
### Version 2.0.3
# > models$pls
# Partial Least Squares
#
# 50 samples
# 2 predictor
#
# No pre-processing
# Resampling: Cross-Validated (5 fold)
# Summary of sample sizes: 41, 40, 40, 40, 39
# Resampling results:
#
# RMSE Rsquared MAE
# 0.1585734 0.1667382 0.1238585
#
# Tuning parameter 'ncomp' was held constant at a value of 1
# > models$pls$resample
# RMSE Rsquared MAE Resample
# 1 0.17173504 0.139669963 0.14545932 Fold1
# 2 0.19414357 0.002362969 0.14549188 Fold2
# 3 0.08472362 0.366659898 0.06373038 Fold3
# 4 0.10989319 0.323403074 0.08959403 Fold4
# 5 0.23237159 0.001595291 0.17501666 Fold5
Ok, the first issue (repeats=10, 45 vs 450 observations) is because of this feature I added:
Allow ensemble of models with different resampling strategies, so long as they were trained on the same data.
To implement this, I take the average of each row's prediction from each fold and each repeat. With one repeat, nothing should change as each observation is in exactly one test set (and 9 training set). But with 10 folds, rather than stacking the entire dataset per repeat (so 10x in this case), we rather take the average of each repeat so the resulting dataset is still the same size as the original data.
This serves 2 purposes:
- Allows mixing of different resampling strategies in one stack, so long as each strategy produces at least one prediction per row.
- Allows scaling to larger datasets with repeats and boostrap sampling, as the stacked dataset no longer grows linearly in size with the number of repeats or bootstrap samples.
In your second example, I suspect the issue is still in the trainControl. Could you also try:
- Setting the
metricin caretList/caretStack toROC - Use
twoClassSummaryin the caretList/caretStacktrainControl
If that doesn't work, try also manually constructing the re-sampling indexes.
To use a classification metric, I have to change the example. Below is a simple example for a binary classification (the code is the same for both the version 4.0.1 and 2.0.3):
# Set the seed for reproducibility:
set.seed(123)
# Number of samples:
n <- 92
# Generate two hypothetical predictors (X1 and X2) with a normal distribution:
X1 <- rnorm(n, mean = 0, sd = 1)
X2 <- rnorm(n, mean = 0, sd = 1)
# We define a simple rule for generating a binary class based on the predictors
# Here, Class (variable to be predicted) is "yes" if a linear combination of the predictors exceeds a certain threshold:
linear_combination <- 0.5 * X1 + 0.3 * X2
threshold <- 0
Class <- as.factor(ifelse(linear_combination > threshold, "yes", "no"))
# Class levels are balanced:
table(Class)
# Class
# no yes
# 50 42
# Create a data frame:
data <- data.frame(X1, X2, Class)
# The first few rows:
head(data)
# X1 X2 Class
# 1 -0.56047565 0.2387317 no
# 2 -0.23017749 -0.6279061 no
# 3 1.55870831 1.3606524 yes
# 4 0.07050839 -0.6002596 no
# 5 0.12928774 2.1873330 yes
# 6 1.71506499 1.5326106 yes
## Now we apply the caretList and caretStack for a classification
set.seed(123)
models <- caretList(
x=data[,1:2],
y=data[,3],
trControl=trainControl(method="repeatedcv",
number=10,
repeats=10,
classProbs=TRUE,
summaryFunction = twoClassSummary,
savePredictions="final"),
methodList=c('svmRadial', 'rf')
)
caretStack(models,
method="rf",
trControl=trainControl(method="repeatedcv",
number=10,
repeats=10,
classProbs=TRUE,
summaryFunction = twoClassSummary,
savePredictions="final"))
############## Results for version 4.0.1
# note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#
# The following models were ensembled: svmRadial, rf
#
# caret::train model:
# Random Forest
#
# 92 samples
# 2 predictor
# 2 classes: 'no', 'yes'
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times)
# Summary of sample sizes: 83, 83, 83, 83, 83, 83, ...
# Resampling results:
#
# ROC Sens Spec
# 0.98335 0.982 0.9185
#
# Tuning parameter 'mtry' was held constant at a value of 2
#
# Final model:
#
# Call:
# randomForest(x = x, y = y, mtry = param$mtry)
# Type of random forest: classification
# Number of trees: 500
# No. of variables tried at each split: 2
#
# OOB estimate of error rate: 5.43%
# Confusion matrix:
# no yes class.error
# no 49 1 0.0200000
# yes 4 38 0.0952381
############## Results for Version 2.0.3
# note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#
# A rf ensemble of 2 base models: svmRadial, rf
#
# Ensemble results:
# Random Forest
#
# 920 samples
# 2 predictor
# 2 classes: 'no', 'yes'
#
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times)
# Summary of sample sizes: 828, 828, 828, 828, 828, 828, ...
# Resampling results:
#
# ROC Sens Spec
# 0.9956952 0.9824 0.955
#
# Tuning parameter 'mtry' was held constant at a value of 2
I still used repeated cross-validation because it is generally better in terms of providing more stable and reliable performance estimates, especially when dealing with small or noisy datasets (like my real data), or when requiring very precise performance metrics for model selection, although it is the resampling method that produces the greater differences between the two versions of the software.
As can be seen from the results with both versions, excellent classification performance is achieved, although version 2.0.3 seems to perform slightly better than 4.0.1.
In your opinion, is it possible in a new version of the software to leave it up to the user to choose how to implement the stacking strategy, while making them aware (perhaps with a brief explanation in the help page) of the differences between the two possible choices? In my specific case (both the examples shown here and the real ones I am working on), I am actually getting better results with the method implemented in 2.0.3.
Unless there are indications of possible bugs that require an update to the latest version, it might be useful to have both stacking strategies.
Ok, so here's how you get the metrics from one model in a caretList to match
4.0.1
caret_6.0-94, caretEnsemble_4.0.1
| ncomp | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| 1 | 0.1585734 | 0.1667382 | 0.1238585 | 0.0606387 | 0.1728704 | 0.0456651 |
2.0.3
caret_6.0-94 caretEnsemble_2.0.3
devtools::install_github("zachmayer/[email protected]")
| ncomp | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| 1 | 0.1585734 | 0.1667382 | 0.1238585 | 0.0606387 | 0.1728704 | 0.0456651 |
Script
rm(list=ls(all=T))
gc(reset=T)
data(iris)
set.seed(123)
NROWS <- 50
NFOLDS <- 5
TARGET <- 3
XVARS <- 1:2
x <- iris[1:NROWS, XVARS]
y <- iris[1:NROWS, TARGET]
models <- caretEnsemble::caretList(
x=x,
y=y,
metric='RMSE',
trControl=caret::trainControl(
method="cv",
index=caret::createFolds(y=y, k=NFOLDS, returnTrain = TRUE),
returnResamp = "final",
savePredictions="final"),
methodList=c('pls')
)
knitr::kable(models$pls$results)
print(sessionInfo())
And here it is for repeated CV. They again match for one model in this simple scenario.
4.0.1
| ncomp | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| 1 | 0.1680096 | 0.1655174 | 0.1262566 | 0.0414542 | 0.1510426 | 0.0289888 |
caret_6.0-94 caretEnsemble_4.0.1
2.0.3
caret_6.0-94 caretEnsemble_2.0.3
| ncomp | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|
| 1 | 0.1680096 | 0.1655174 | 0.1262566 | 0.0414542 | 0.1510426 | 0.0289888 |
Script
rm(list=ls(all=T))
gc(reset=T)
data(iris)
set.seed(123)
NROWS <- 50
NREPEATS <- 5
NFOLDS <- 5
TARGET <- 3
XVARS <- 1:2
x <- iris[1:NROWS, XVARS]
y <- iris[1:NROWS, TARGET]
models <- caretEnsemble::caretList(
x=x,
y=y,
metric='RMSE',
trControl=caret::trainControl(
method="cv",
index=caret::createMultiFolds(y=y, k=NFOLDS, times=NREPEATS),
returnResamp = "final",
savePredictions="final"),
methodList=c('pls')
)
knitr::kable(models$pls$results)
print(sessionInfo())
So you should be able to get the results from caretList to match with both versions. The caretStack results will continue to differ; however, due to the changes I made the stacking logic.
I don't love the idea of maintaining 2 versions of the stacking code. This package is already so complex that it literally took me a decade to get around to adding multiclass support. I would really like to keep the code as simple as possible, because that means in the future I will more likely be able to add new features and fix bugs with my very, very limited time.
If you really need the old stacking logic, just dodevtools::install_github("zachmayer/[email protected]") — the code is very simple and I don't see any reason why the old version won't continue to work for another 5-10 years into the future.
The other option, if you wish to take it on, is to modify extractBestPreds and caretList.
extractBestPreds
Add an argument for aggregate_resamples, and set the default to True. Then put this block of code inside an if(aggregate_resamples):
# If we have multiple resamples per row
# e.g. for repeated CV, we need to average the predictions
keys <- "rowIndex"
data.table::setkeyv(pred, keys)
pred <- pred[, lapply(.SD, aggregate_mean_or_first), by = keys]
caretList
Add an argument for aggregate_resamples, and set the default to True. Pass the argument through the lappy loop to caretTrain in this block of code:
# Loop through the tuneLists and fit caret models with those specs
modelList <- lapply(tuneList, caretTrain, global_args = global_args, continue_on_fail = continue_on_fail, trim = trim)
names(modelList) <- names(tuneList)
nulls <- vapply(modelList, is.null, logical(1L))
modelList <- modelList[!nulls]
Unit tests
- Make sure existing tests pass with
make lint spell test. - Add new unit tests for aggregate_resamples that specifically checks that the repeated cv case returns all resamples.
- When
make lint spell testpasses with your new test, make suremake allpasses too.
Pull Request
For now, thank you very much for your time and patience. I very much appreciate all the valuable suggestions you have given and will try to take my time to look at everything.
You're welcome— I'm glad to see people using my package!
Better instructions to fix it here: https://github.com/zachmayer/caretEnsemble/issues/371
This PR will add the new feature: https://github.com/zachmayer/caretEnsemble/pull/374
You will need to re-install from main once it merges, and then pass aggregate_resamples=FALSE to caretList and caretStack.
please try the new version with aggregate_resamples=FALSE, which should allow you to do what you want