make permutation importance test optional, functionality to use any outcome variables, optional hyperparameter input
Fixes #6 Fixes #7 Fixes #8 Fixes #10
get_results(dataset, models, split_number, outcome="dx") errors out when outcome is defined by user.
pipeline(dataset, models, split_number, outcome="dx") works. So it must be an issue when the argument is being passed from get_results to pipeline function.
Define outcome variable and permutation logical as arguments that can be passed in the command line. We can now run from command line with:
Rscript code/learning/main.R 1 "L2_Logistic_Regression" "dx" 0
Previous changes regarding setting outcome and perm on the command line work when the user defines the outcome (e.g. "dx") as an argument. However, if they leave that argument empty (passed as NA e.g. Rscript code/learning/main.R 1 "L2_Logistic_Regression" 0), the rest of the pipeline breaks (the first column does not get selected).
- I changed the order of the arguments in
get_aucsfunction and then instead of NULL, I uses NA:
get_results <- function(dataset, models, split_number, perm=T, outcome=NA, hyperparameters=NULL)
- Changed
pipelinefunction to have NA as well.
pipeline <- function(dataset, model, split_number, outcome=NA, hyperparameters=NULL, perm=T)
- Edited
outcome=NULLargument fromtuning_gridfunction because outcome infor should be decided in the previous functions already:
tuning_grid <- function(train_data, model, outcome, hyperparameters=NULL)
With these changes, it looks like we Fixed #6 and #7. Next step is checking if permutation works as we want in #8.
permutation_importance function doesn't work.
Error:
Error in -sym(first_outcome) : invalid argument to unary operator
Calls: get_results ... <Anonymous> -> vars_select_eval -> map_if -> map -> .f
Caught the bug in line 87 and changed from first_outcome to outcome:
non_correlated_otus <- full %>%
select(-correlated_otus) %>%
select(-sym(outcome)) %>%
colnames()
Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one
Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one
I'm running it now with the change I've made (it was passing first_outcome which is not a column that can be selected but now uses sym("dx") which should work theoretically, I'll keep you posted.
But don't we want to have dx not hard-coded?
But don't we want to have dx not hard-coded?
No I know, I have it as:
non_correlated_otus <- full %>%
select(-correlated_otus) %>%
select(-sym(outcome)) %>%
colnames()
Instead of what it was before:
non_correlated_otus <- full %>%
select(-correlated_otus) %>%
select(-sym(first_outcome)) %>%
colnames()
Our previous attempt was unsuccessful - must be a bug with tidyverse. I made a new change:
non_correlated_otus <- full %>%
select(-correlated_otus)
non_correlated_otus[,outcome] <- NULL
non_correlated_otus <- non_correlated_otus %>%
colnames()
Not the most beautiful code snippet I've written but it'll do:)
#8 We now made permutation importance optional but the data structure to run permutation is still hardcoded. We need to come back to that and fix it.
I'll now check if user-defined hyperparameters work #10 .
The NULL options for hyperparameters are currently specific to my CRC classification problem (except random forest, where we implement Pat's code: mtry <- floor(seq(1, n_features, length=6)) ). So those need to be adjusted and expanded in the future, but overall, I'm able to set up user-defined hyperparameters as a list and it works. Fixed #10.