caretEnsemble icon indicating copy to clipboard operation
caretEnsemble copied to clipboard

issue : CaretEnsemble with different trained models

Open zee86 opened this issue 8 years ago • 1 comments

hi, Basically in the following code I created five imputed datasets, then applied SVM to each imputed dataset using the train function in caret, then ensemble the resulted training model using caretEnsemble. to be able at the end to predict each test set using the ensemble model. however, I have the following error: Error in check_bestpreds_obs(modelLibrary) : Observed values for each component model are not the same. Please re-train the models with the same Y variable

Is there any way that can enable caretEnsembel to accept different trained model or if there any R package out there you may know can allow me to do the ensemble of those different trained models ?

I appreciate any help. Thank you.

library(mice) library(e1071) library(caret) library("caretEnsemble")

data <- iris #Generate 10% missing values at Random iris.mis <- prodNA(iris, noNA = 0.1) #remove categorical variables iris.mis <- subset(iris.mis, select = -c(Species))

#5 Imputation using mice pmm

imp <- mice(iris.mis, m=5, maxit = 10, method = 'pmm', seed = 500)

#save 5 imputed dataset. x1 <- complete(imp, action = 1, include = FALSE) x2 <- complete(imp, action = 2, include = FALSE) x3 <- complete(imp, action = 3, include = FALSE) x4 <- complete(imp, action = 4, include = FALSE) x5 <- complete(imp, action = 5, include = FALSE)

##Apply the following method for each imputed set

form <- iris$Sepal.Width # target coloumn n <- nrow(x1) # since all data sample are the same length prop <- n%/%fold set.seed(7) newseq <- rank(runif(n)) k <- as.factor((newseq - 1)%/%prop + 1) i<-1 CVfolds <- 10 CVrepeats <- 3 indexPreds <- createMultiFolds(x1[k != i,]$Sepal.Width, CVfolds, CVrepeats) ctrl <- trainControl(method = "repeatedcv", repeats = CVrepeats,number = CVfolds, returnResamp = "all", savePredictions = "all", index = indexPreds)

fit1 <- train(Sepal.Width ~., data = x1[k !=i, ],method='svmLinear2',trControl = ctrl) fit2 <- train(Sepal.Width ~., data = x2[k != i, ],method='svmLinear2',trControl = ctrl) fit3 <- train(Sepal.Width ~., data = x3[k != i, ],method='svmLinear2',trControl = ctrl) fit4 <- train(Sepal.Width ~., data = x4[k != i, ],method='svmLinear2',trControl = ctrl) fit5 <- train(Sepal.Width ~., data = x5[k != i, ],method='svmLinear2',trControl = ctrl) #combine the created model to a list svm.fit <- list( fit1, fit2, fit3, fit4, fit5)

#convert the list to cartlist class(svm.fit) <- "caretList"

#create the ensemble where the error occur. svm.all <- caretEnsemble(svm.fit,method='svmLinear2') ##predict test set using the ensembel fcast1 <- predict(svm.all, newdata = x1[k == i, ])

zee86 avatar Dec 07 '16 19:12 zee86

Observed values for each component model are not the same. Please re-train the models with the same Y variable

caretEnsemble currently requires all the models have the same target variable. In this case, it looks like you end up with different target variables because you are using different imputations.

zachmayer avatar Dec 07 '16 20:12 zachmayer