caretEnsemble icon indicating copy to clipboard operation
caretEnsemble copied to clipboard

2.0.3 vs 4.0+ stacking results differ

Open BioinfoMonzino opened this issue 1 year ago • 22 comments
trafficstars

Probably a general and naive question, but I recently used caretEnsemble (both caretList and caretStack functions) for a classification analysis with the new version 4.0.1, but when comparing the results obtained with the same script with version 2.0.3, I noticed a dramatic decrease in classification performance. How can I reproduce the same result with the newer version of caretEnsemble?

BioinfoMonzino avatar Oct 11 '24 12:10 BioinfoMonzino

I changed the defaults for the trainControl in 4.0. Instead of using 25 fold bootstrap sampling (the caret default), I'm using 5 or 10 fold CV.

It's a little convoluted, but you can install the old release, run your code, manually inspect the trainControl you get in the old model, and then install the new release and manually set that train control.

In general, its a good idea to make sure you're explicitly setting the train control, both for caretStack and for caretList. That will help guarantee it stays the same over time.

zachmayer avatar Oct 11 '24 12:10 zachmayer

Here's an example:

# Fit the old model
devtools::install_github("zachmayer/[email protected]")
set.seed(42)
models_2.0.3 <- caretEnsemble::caretList(
  Sepal.Length ~ ., 
  data = iris, 
  methodList = c("glm", "rpart")
)
print(sapply(models_2.0.3, function(x) min(x$results$RMSE)))
#       glm     rpart 
# 0.3134526 0.4315253 

# Save trainControl to a file
train_control <- models_2.0.3[[1]]$control
saveRDS(train_control, file = "train_control.rds")

# Fit the new model (need to start a new session for this)
devtools::install_github("zachmayer/[email protected]")
loaded_train_control <- readRDS("train_control.rds")
set.seed(42)
models_4.0.0 <- caretEnsemble::caretList(
  Sepal.Length ~ ., 
  data = iris, 
  trControl = loaded_train_control,
  methodList = c("glm", "rpart")
)
print(sapply(models_4.0.0, function(x) min(x$results$RMSE)))
#       glm     rpart 
# 0.3134526 0.4315253 

zachmayer avatar Oct 11 '24 12:10 zachmayer

First of all, thank you for your prompt reply! As you suggested, I had explicitly set the train control with method='repeatedcv', number=10, repeats=10, for both caretList and caretStack, but the classification performance with the 4.0.1 version, as I mentioned, decreased a lot compared to the 2.0.3 version. Now, when I try to run the above example on my data, I get an error: Bad seeds: and I'm trying to figure out why...

BioinfoMonzino avatar Oct 11 '24 13:10 BioinfoMonzino

trainControl sets random seeds for each csv fold. make sure your new code and old code have the same number of folds.

get caretList working first, then work on caretStack!

zachmayer avatar Oct 11 '24 14:10 zachmayer

Yes, this exactly what I did but the two versions still return different results...I will try to work around again...

BioinfoMonzino avatar Oct 11 '24 15:10 BioinfoMonzino

Good luck!

zachmayer avatar Oct 11 '24 18:10 zachmayer

Dear Zack, I don't want to bore you again or be inappropriate, but I wanted to resubmit the problem with the reproducibility of the results obtained using the two versions of caretEnsemble, v2.0.3 versus v4.0.1.

I reused the example data from above, but changed the resampling method and added more classifiers for a regression problem. Again, running the same code on the two versions of the software produced very different results.

Below is the code used for both the software versions:

set.seed(123)
models <- caretList(
  x=iris[1:50,1:2],
  y=iris[1:50,3],
  trControl=trainControl(method="repeatedcv", 
                         number=10, 
                         repeats=10, 
                         savePredictions="final"),
  methodList=c('svmRadial', 'rf', 'pls', 'glm')
)
caretStack(models, 
           method="rf", 
           trControl=trainControl(method="repeatedcv", 
                                  number=10, 
                                  repeats=10, 
                                  savePredictions="final"))

## Printed Results obtained with v4.0.1 ##

# The following models were ensembled: svmRadial, rf, pls, glm  
# 
# caret::train model:
#   Random Forest 
# 
# 50 samples
# 4 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times) 
# Summary of sample sizes: 43, 46, 45, 45, 46, 45, ... 
# Resampling results across tuning parameters:
#   
#   mtry  RMSE       Rsquared   MAE      
# 2     0.1663337  0.2667371  0.1318132
# 3     0.1666885  0.2725491  0.1319602
# 4     0.1675535  0.2799366  0.1332284
# 
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.
# 
# Final model:
#   
#   Call:
#   randomForest(x = x, y = y, mtry = param$mtry) 
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
# 
# Mean of squared residuals: 0.03072593
# % Var explained: -3.96

################################################################################

## Printed results obtained with v2.0.3 ##

# A rf ensemble of 4 base models: svmRadial, rf, pls, glm
# 
# Ensemble results:
#   Random Forest 
# 
# 500 samples
# 4 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times) 
# Summary of sample sizes: 450, 450, 450, 450, 450, 450, ... 
# Resampling results across tuning parameters:
#   
#   mtry  RMSE       Rsquared   MAE       
# 2     0.1106396  0.5841740  0.07552396
# 3     0.1106041  0.5831640  0.07403516
# 4     0.1109225  0.5809271  0.07317610
# 
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 3.

At first glance it seems that the number of samples used for resampling is different, i.e. 50 for 4.0.1 and 500 for 2.0.3. Doesn't 4.0.1 take into account the number of repeats (repeats=10)?

The problem for me now is to find out whether the results obtained with version 2.0.3, which seem to be getting better and better, are reliable and whether there is a way of reproducing them exactly with the new version of the software.

Is there any other suggestion I can follow to solve this problem? Thank you in advance.

BioinfoMonzino avatar Oct 14 '24 10:10 BioinfoMonzino

What happens with repeats=1?

On Mon, Oct 14, 2024 at 6:28 AM BioinfoMonzino @.***> wrote:

Dear Zack, I don't want to bore you again or be inappropriate, but I wanted to resubmit the problem with the reproducibility of the results obtained using the two versions of caretEnsemble, v2.0.3 versus v4.0.1.

I reused the example data from above, but changed the resampling method and added more classifiers for a regression problem. Again, running the same code on the two versions of the software produced very different results.

Below is the code used for both the software versions:

set.seed(123) models <- caretList( x=iris[1:50,1:2], y=iris[1:50,3], trControl=trainControl(method="repeatedcv", number=10, repeats=10, savePredictions="final"), methodList=c('svmRadial', 'rf', 'pls', 'glm') ) caretStack(models, method="rf", trControl=trainControl(method="repeatedcv", number=10, repeats=10, savePredictions="final"))

Printed Results obtained with v4.0.1

The following models were ensembled: svmRadial, rf, pls, glm

caret::train model:

Random Forest

50 samples

4 predictor

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 10 times)

Summary of sample sizes: 43, 46, 45, 45, 46, 45, ...

Resampling results across tuning parameters:

mtry RMSE Rsquared MAE

2 0.1663337 0.2667371 0.1318132

3 0.1666885 0.2725491 0.1319602

4 0.1675535 0.2799366 0.1332284

RMSE was used to select the optimal model using the smallest value.

The final value used for the model was mtry = 2.

Final model:

Call:

randomForest(x = x, y = y, mtry = param$mtry)

Type of random forest: regression

Number of trees: 500

No. of variables tried at each split: 2

Mean of squared residuals: 0.03072593

% Var explained: -3.96

################################################################################

Printed results obtained with v2.0.3

A rf ensemble of 4 base models: svmRadial, rf, pls, glm

Ensemble results:

Random Forest

500 samples

4 predictor

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 10 times)

Summary of sample sizes: 450, 450, 450, 450, 450, 450, ...

Resampling results across tuning parameters:

mtry RMSE Rsquared MAE

2 0.1106396 0.5841740 0.07552396

3 0.1106041 0.5831640 0.07403516

4 0.1109225 0.5809271 0.07317610

RMSE was used to select the optimal model using the smallest value.

The final value used for the model was mtry = 3.

At first glance it seems that the number of samples used for resampling is different, i.e. 50 for 4.0.1 and 500 for 2.0.3. Doesn't 4.0.1 take into account the number of repeats (repeats=10)?

The problem for me now is to find out whether the results obtained with version 2.0.3, which seem to be getting better and better, are reliable and whether there is a way of reproducing them exactly with the new version of the software.

Is there any other suggestion I can follow to solve this problem? Thank you in advance.

— Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/340#issuecomment-2410749296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEN7VQA5BNQ4UNO4Y7ZYYLZ3OMFTAVCNFSM6AAAAABPYZVC4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJQG42DSMRZGY . You are receiving this because you commented.Message ID: @.***>

zachmayer avatar Oct 14 '24 12:10 zachmayer

Here is the results with repeats=1

# ##### Version 4.0.1
# 
# The following models were ensembled: svmRadial, rf, pls, glm  
# 
# caret::train model:
#   Random Forest 
# 
# 50 samples
# 4 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times) 
# Summary of sample sizes: 45, 45, 45, 44, 46, 45, ... 
# Resampling results across tuning parameters:
#   
#   mtry  RMSE       Rsquared   MAE      
# 2     0.1685696  0.2384783  0.1363316
# 3     0.1705608  0.2168169  0.1388907
# 4     0.1720041  0.1626698  0.1401095
# 
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.
# 
# Final model:
#   
#   Call:
#   randomForest(x = x, y = y, mtry = param$mtry) 
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
# 
# Mean of squared residuals: 0.03081381
# % Var explained: -4.26
# 
# 
# 
# ##### Version 2.0.3
# 
# A rf ensemble of 4 base models: svmRadial, rf, pls, glm
# 
# Ensemble results:
#   Random Forest 
# 
# 50 samples
# 4 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times) 
# Summary of sample sizes: 44, 44, 46, 45, 45, 46, ... 
# Resampling results across tuning parameters:
#   
#   mtry  RMSE       Rsquared   MAE      
# 2     0.1850652  0.3609027  0.1423471
# 3     0.1881701  0.3832344  0.1456612
# 4     0.1876367  0.3819812  0.1448195
# 
# RMSE was used to select the optimal model using the smallest value.
# The final value used for the model was mtry = 2.

BioinfoMonzino avatar Oct 14 '24 12:10 BioinfoMonzino

Can you simplify your code more? If you remove all but one model in the list, do the results still differ?

If you remove the stack and just look at the list, do the results differ?

Etc

On Mon, Oct 14, 2024 at 9:00 AM BioinfoMonzino @.***> wrote:

Here is the results with repeats=1

##### Version 4.0.1

The following models were ensembled: svmRadial, rf, pls, glm

caret::train model:

Random Forest

50 samples

4 predictor

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 1 times)

Summary of sample sizes: 45, 45, 45, 44, 46, 45, ...

Resampling results across tuning parameters:

mtry RMSE Rsquared MAE

2 0.1685696 0.2384783 0.1363316

3 0.1705608 0.2168169 0.1388907

4 0.1720041 0.1626698 0.1401095

RMSE was used to select the optimal model using the smallest value.

The final value used for the model was mtry = 2.

Final model:

Call:

randomForest(x = x, y = y, mtry = param$mtry)

Type of random forest: regression

Number of trees: 500

No. of variables tried at each split: 2

Mean of squared residuals: 0.03081381

% Var explained: -4.26

##### Version 2.0.3

A rf ensemble of 4 base models: svmRadial, rf, pls, glm

Ensemble results:

Random Forest

50 samples

4 predictor

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 1 times)

Summary of sample sizes: 44, 44, 46, 45, 45, 46, ...

Resampling results across tuning parameters:

mtry RMSE Rsquared MAE

2 0.1850652 0.3609027 0.1423471

3 0.1881701 0.3832344 0.1456612

4 0.1876367 0.3819812 0.1448195

RMSE was used to select the optimal model using the smallest value.

The final value used for the model was mtry = 2.

— Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/340#issuecomment-2411166985, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEN7VUB7CMURCX6PU62LJLZ3O55XAVCNFSM6AAAAABPYZVC4CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJRGE3DMOJYGU . You are receiving this because you commented.Message ID: @.***>

zachmayer avatar Oct 14 '24 13:10 zachmayer

here a more simplified code:

set.seed(123)
models <- caretList(
  x=iris[1:50,1:2],
  y=iris[1:50,3],
  trControl=trainControl(method="repeatedcv", 
                         number=10, 
                         repeats=1, 
                         savePredictions="final"),
  methodList=c('glm')
)

models$glm

### Version 4.0.1

# Generalized Linear Model 
# 
# No pre-processing
# Resampling results:
#   
#   RMSE       Rsquared   MAE      
# 0.1698624  0.3091359  0.1315746

models$glm$resample

# RMSE    Rsquared        MAE    Resample
# 1  0.25038795 0.758396285 0.19940477 Fold01.Rep1
# 2  0.26133110 0.193946172 0.21313059 Fold02.Rep1
# 3  0.13418496 0.456645047 0.11436877 Fold03.Rep1
# 4  0.18777545 0.561684343 0.14561714 Fold04.Rep1
# 5  0.20148024 0.054303271 0.13901066 Fold05.Rep1
# 6  0.07797887 0.171861843 0.05607183 Fold06.Rep1
# 7  0.07728170 0.611740910 0.06159649 Fold07.Rep1
# 8  0.22979578 0.009091235 0.15238127 Fold08.Rep1
# 9  0.14529244 0.121730934 0.12106501 Fold09.Rep1
# 10 0.13311569 0.151958817 0.11309911 Fold10.Rep1


### Version 2.0.3

# Generalized Linear Model 
# 
# 50 samples
# 2 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times) 
# Summary of sample sizes: 45, 45, 45, 45, 45, 46, ... 
# Resampling results:
#   
#   RMSE       Rsquared  MAE      
# 0.1644285  0.257963  0.1264851

models$glm$resample

# RMSE    Rsquared        MAE    Resample
# 1  0.16189444 0.584288689 0.13352029 Fold01.Rep1
# 2  0.24314587 0.005362388 0.17808495 Fold02.Rep1
# 3  0.08704300 0.102801366 0.06901606 Fold03.Rep1
# 4  0.09737736 0.129642566 0.08078300 Fold04.Rep1
# 5  0.21961301 0.194360262 0.15093683 Fold05.Rep1
# 6  0.19606388 0.198342342 0.17020265 Fold06.Rep1
# 7  0.13870038 0.307985868 0.11587159 Fold07.Rep1
# 8  0.09314910 0.746812486 0.07969144 Fold08.Rep1
# 9  0.21929397 0.245055016 0.15167377 Fold09.Rep1
# 10 0.18800362 0.064979455 0.13507018 Fold10.Rep1

So the RMSE is comparable between the two, although the result is still different and even more so if you look at each fold. ...I will try with other models...

BioinfoMonzino avatar Oct 14 '24 13:10 BioinfoMonzino

Below I've changed the model and the resampling method:

set.seed(123)
models <- caretList(
  x=iris[1:50,1:2],
  y=iris[1:50,3],
  trControl=trainControl(method="cv", 
                         number=5, 
                         savePredictions="final"),
  methodList=c('pls')
)

models$pls

models$pls$resample

### Version 4.0.1 

# > models$pls
# Partial Least Squares 
# 
# No pre-processing
# Resampling results:
#   
#   RMSE       Rsquared  MAE      
# 0.1757146  0.1515    0.1345077
# 
# Tuning parameter 'ncomp' was held constant at a value of 1
# > 
#   > models$pls$resample
# RMSE    Rsquared        MAE Resample
# 1 0.1118829 0.085753622 0.08732348    Fold1
# 2 0.1925753 0.142191146 0.14345631    Fold2
# 3 0.1317513 0.007128456 0.11463952    Fold3
# 4 0.2211226 0.498756038 0.17546404    Fold4
# 5 0.2212410 0.023670539 0.15165538    Fold5

### Version 2.0.3

# > models$pls
# Partial Least Squares 
# 
# 50 samples
# 2 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (5 fold) 
# Summary of sample sizes: 41, 40, 40, 40, 39 
# Resampling results:
#   
#   RMSE       Rsquared   MAE      
# 0.1585734  0.1667382  0.1238585
# 
# Tuning parameter 'ncomp' was held constant at a value of 1
# > models$pls$resample
# RMSE    Rsquared        MAE Resample
# 1 0.17173504 0.139669963 0.14545932    Fold1
# 2 0.19414357 0.002362969 0.14549188    Fold2
# 3 0.08472362 0.366659898 0.06373038    Fold3
# 4 0.10989319 0.323403074 0.08959403    Fold4
# 5 0.23237159 0.001595291 0.17501666    Fold5

BioinfoMonzino avatar Oct 14 '24 13:10 BioinfoMonzino

Ok, the first issue (repeats=10, 45 vs 450 observations) is because of this feature I added: Allow ensemble of models with different resampling strategies, so long as they were trained on the same data.

To implement this, I take the average of each row's prediction from each fold and each repeat. With one repeat, nothing should change as each observation is in exactly one test set (and 9 training set). But with 10 folds, rather than stacking the entire dataset per repeat (so 10x in this case), we rather take the average of each repeat so the resulting dataset is still the same size as the original data.

This serves 2 purposes:

  1. Allows mixing of different resampling strategies in one stack, so long as each strategy produces at least one prediction per row.
  2. Allows scaling to larger datasets with repeats and boostrap sampling, as the stacked dataset no longer grows linearly in size with the number of repeats or bootstrap samples.

zachmayer avatar Oct 14 '24 15:10 zachmayer

In your second example, I suspect the issue is still in the trainControl. Could you also try:

  1. Setting the metric in caretList/caretStack to ROC
  2. Use twoClassSummary in the caretList/caretStack trainControl

If that doesn't work, try also manually constructing the re-sampling indexes.

zachmayer avatar Oct 14 '24 15:10 zachmayer

To use a classification metric, I have to change the example. Below is a simple example for a binary classification (the code is the same for both the version 4.0.1 and 2.0.3):

# Set the seed for reproducibility:
set.seed(123)

# Number of samples:
n <- 92

# Generate two hypothetical predictors (X1 and X2) with a normal distribution:
X1 <- rnorm(n, mean = 0, sd = 1)
X2 <- rnorm(n, mean = 0, sd = 1)

# We define a simple rule for generating a binary class based on the predictors
# Here, Class (variable to be predicted) is "yes" if a linear combination of the predictors exceeds a certain threshold:
linear_combination <- 0.5 * X1 + 0.3 * X2
threshold <- 0
Class <- as.factor(ifelse(linear_combination > threshold, "yes", "no"))
# Class levels are balanced:
table(Class)

# Class
# no yes 
# 50  42

# Create a data frame:
data <- data.frame(X1, X2, Class)

# The first few rows:
head(data)

# X1         X2 Class
# 1 -0.56047565  0.2387317    no
# 2 -0.23017749 -0.6279061    no
# 3  1.55870831  1.3606524   yes
# 4  0.07050839 -0.6002596    no
# 5  0.12928774  2.1873330   yes
# 6  1.71506499  1.5326106   yes

## Now we apply the caretList and caretStack for a classification

set.seed(123)
models <- caretList(
  x=data[,1:2],
  y=data[,3],
  trControl=trainControl(method="repeatedcv", 
                         number=10, 
                         repeats=10, 
                         classProbs=TRUE,
                         summaryFunction = twoClassSummary,
                         savePredictions="final"),
  methodList=c('svmRadial', 'rf')

)


caretStack(models, 
           method="rf", 
           trControl=trainControl(method="repeatedcv", 
                                  number=10, 
                                  repeats=10, 
                                  classProbs=TRUE,
                                  summaryFunction = twoClassSummary,
                                  savePredictions="final"))

############## Results for version 4.0.1 

# note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
# 
# The following models were ensembled: svmRadial, rf  
# 
# caret::train model:
#   Random Forest 
# 
# 92 samples
# 2 predictor
# 2 classes: 'no', 'yes' 
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times) 
# Summary of sample sizes: 83, 83, 83, 83, 83, 83, ... 
# Resampling results:
#   
#   ROC      Sens   Spec  
# 0.98335  0.982  0.9185
# 
# Tuning parameter 'mtry' was held constant at a value of 2
# 
# Final model:
#   
#   Call:
#   randomForest(x = x, y = y, mtry = param$mtry) 
# Type of random forest: classification
# Number of trees: 500
# No. of variables tried at each split: 2
# 
# OOB estimate of  error rate: 5.43%
# Confusion matrix:
#   no yes class.error
# no  49   1   0.0200000
# yes  4  38   0.0952381

############## Results for Version 2.0.3 

# note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
# 
# A rf ensemble of 2 base models: svmRadial, rf
# 
# Ensemble results:
#   Random Forest 
# 
# 920 samples
# 2 predictor
# 2 classes: 'no', 'yes' 
# 
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 10 times) 
# Summary of sample sizes: 828, 828, 828, 828, 828, 828, ... 
# Resampling results:
#   
#   ROC        Sens    Spec 
# 0.9956952  0.9824  0.955
# 
# Tuning parameter 'mtry' was held constant at a value of 2


I still used repeated cross-validation because it is generally better in terms of providing more stable and reliable performance estimates, especially when dealing with small or noisy datasets (like my real data), or when requiring very precise performance metrics for model selection, although it is the resampling method that produces the greater differences between the two versions of the software.

BioinfoMonzino avatar Oct 15 '24 09:10 BioinfoMonzino

As can be seen from the results with both versions, excellent classification performance is achieved, although version 2.0.3 seems to perform slightly better than 4.0.1.

In your opinion, is it possible in a new version of the software to leave it up to the user to choose how to implement the stacking strategy, while making them aware (perhaps with a brief explanation in the help page) of the differences between the two possible choices? In my specific case (both the examples shown here and the real ones I am working on), I am actually getting better results with the method implemented in 2.0.3.

Unless there are indications of possible bugs that require an update to the latest version, it might be useful to have both stacking strategies.

BioinfoMonzino avatar Oct 15 '24 09:10 BioinfoMonzino

Ok, so here's how you get the metrics from one model in a caretList to match

4.0.1

caret_6.0-94, caretEnsemble_4.0.1

ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 0.1585734 0.1667382 0.1238585 0.0606387 0.1728704 0.0456651

2.0.3

caret_6.0-94 caretEnsemble_2.0.3

devtools::install_github("zachmayer/[email protected]")

ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 0.1585734 0.1667382 0.1238585 0.0606387 0.1728704 0.0456651

Script

rm(list=ls(all=T))
gc(reset=T)
data(iris)
set.seed(123)

NROWS <- 50
NFOLDS <- 5
TARGET <- 3
XVARS <- 1:2

x <- iris[1:NROWS, XVARS]
y <- iris[1:NROWS, TARGET]

models <- caretEnsemble::caretList(
  x=x,
  y=y,
  metric='RMSE',
  trControl=caret::trainControl(
    method="cv", 
    index=caret::createFolds(y=y, k=NFOLDS, returnTrain = TRUE),
    returnResamp = "final",
    savePredictions="final"),
  methodList=c('pls')
)

knitr::kable(models$pls$results)
print(sessionInfo())

zachmayer avatar Oct 15 '24 13:10 zachmayer

And here it is for repeated CV. They again match for one model in this simple scenario.

4.0.1

ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 0.1680096 0.1655174 0.1262566 0.0414542 0.1510426 0.0289888

caret_6.0-94 caretEnsemble_4.0.1

2.0.3

caret_6.0-94 caretEnsemble_2.0.3

ncomp RMSE Rsquared MAE RMSESD RsquaredSD MAESD
1 0.1680096 0.1655174 0.1262566 0.0414542 0.1510426 0.0289888

Script

rm(list=ls(all=T))
gc(reset=T)
data(iris)
set.seed(123)

NROWS <- 50
NREPEATS <- 5
NFOLDS <- 5
TARGET <- 3
XVARS <- 1:2

x <- iris[1:NROWS, XVARS]
y <- iris[1:NROWS, TARGET]

models <- caretEnsemble::caretList(
  x=x,
  y=y,
  metric='RMSE',
  trControl=caret::trainControl(
    method="cv", 
    index=caret::createMultiFolds(y=y, k=NFOLDS, times=NREPEATS),
    returnResamp = "final",
    savePredictions="final"),
  methodList=c('pls')
)

knitr::kable(models$pls$results)
print(sessionInfo())

zachmayer avatar Oct 15 '24 13:10 zachmayer

So you should be able to get the results from caretList to match with both versions. The caretStack results will continue to differ; however, due to the changes I made the stacking logic.

I don't love the idea of maintaining 2 versions of the stacking code. This package is already so complex that it literally took me a decade to get around to adding multiclass support. I would really like to keep the code as simple as possible, because that means in the future I will more likely be able to add new features and fix bugs with my very, very limited time.

If you really need the old stacking logic, just dodevtools::install_github("zachmayer/[email protected]") — the code is very simple and I don't see any reason why the old version won't continue to work for another 5-10 years into the future.

zachmayer avatar Oct 15 '24 13:10 zachmayer

The other option, if you wish to take it on, is to modify extractBestPreds and caretList.

extractBestPreds

Add an argument for aggregate_resamples, and set the default to True. Then put this block of code inside an if(aggregate_resamples):

# If we have multiple resamples per row
# e.g. for repeated CV, we need to average the predictions
keys <- "rowIndex"
data.table::setkeyv(pred, keys)
pred <- pred[, lapply(.SD, aggregate_mean_or_first), by = keys]

caretList

Add an argument for aggregate_resamples, and set the default to True. Pass the argument through the lappy loop to caretTrain in this block of code:

# Loop through the tuneLists and fit caret models with those specs
modelList <- lapply(tuneList, caretTrain, global_args = global_args, continue_on_fail = continue_on_fail, trim = trim)
names(modelList) <- names(tuneList)
nulls <- vapply(modelList, is.null, logical(1L))
modelList <- modelList[!nulls]

Unit tests

  1. Make sure existing tests pass with make lint spell test.
  2. Add new unit tests for aggregate_resamples that specifically checks that the repeated cv case returns all resamples.
  3. When make lint spell test passes with your new test, make sure make all passes too.

Pull Request

  1. Bump the package version to 4.0.2
  2. Update NEWS
  3. Make a Pull Request in Githhub and verify PR tests pass.

zachmayer avatar Oct 15 '24 14:10 zachmayer

For now, thank you very much for your time and patience. I very much appreciate all the valuable suggestions you have given and will try to take my time to look at everything.

BioinfoMonzino avatar Oct 15 '24 14:10 BioinfoMonzino

You're welcome— I'm glad to see people using my package!

zachmayer avatar Oct 15 '24 14:10 zachmayer

Better instructions to fix it here: https://github.com/zachmayer/caretEnsemble/issues/371

zachmayer avatar Dec 14 '24 14:12 zachmayer

This PR will add the new feature: https://github.com/zachmayer/caretEnsemble/pull/374

You will need to re-install from main once it merges, and then pass aggregate_resamples=FALSE to caretList and caretStack.

zachmayer avatar Dec 17 '24 16:12 zachmayer

please try the new version with aggregate_resamples=FALSE, which should allow you to do what you want

zachmayer avatar Dec 17 '24 21:12 zachmayer