CAST icon indicating copy to clipboard operation
CAST copied to clipboard

Account for tibble (non-)drop behavior in aoa

Open gpatoine opened this issue 3 years ago • 0 comments

First, thanks for your work on CAST. It is a very nice package and I am looking forward to further developments.

I recently ran into an issue while trying to run the tutorial https://cran.r-project.org/web/packages/CAST/vignettes/AOA-tutorial.html with my own data. I ran the function aoa, but the AOA$AOA results were only zeros.

AOA <- aoa(newdata = newdata, model = mod1, returnTrainDI = TRUE, cl = cl)

I found the issue was that I am using a tibble when training the model as below:

mod1 <- train(x = mytbl[,predictorNames], 
               y = mytbl$response,
               method = "rf",
               importance = TRUE,
               tuneGrid = expand.grid(mtry = c(2:length(predictorNames))),
               trControl = trainControl(method = "cv", savePredictions = TRUE))

Because of that, model$trainingData is also a tibble, and on line 168, newdata[,catvar] becomes NA, because I have one categorical predictor. tibble has a different dropping behavior than data.frame when a single column is returned. Specifically, unique(train[,catvar]) return a one-column tibble instead of a vector. https://github.com/HannaMeyer/CAST/blob/b34bc3526226b9a9bee5111d684d68dcf07d0432/R/aoa.R#L168

The solution for me was to use mytbl <- as.data.frame(mytbl) before training the model, but I would suggest to use this at the beginning of the aoa function call to increase robustness to handle tibbles as well:

if(is.null(train)){train <- as.data.frame(model$trainingData)}

I don't have a ready reprex but I hope my description is sufficient to understand the issue.

gpatoine avatar Jan 19 '21 12:01 gpatoine