Topcuoglu_ML_mBio_2020 make permutation importance test optional, functionality to use any outcome variables, optional hyperparameter input

Fixes #6 Fixes #7 Fixes #8 Fixes #10

Nov 11 '19 18:11 zenalapp

get_results(dataset, models, split_number, outcome="dx") errors out when outcome is defined by user. pipeline(dataset, models, split_number, outcome="dx") works. So it must be an issue when the argument is being passed from get_results to pipeline function.

Nov 11 '19 22:11 BTopcuoglu

Define outcome variable and permutation logical as arguments that can be passed in the command line. We can now run from command line with:

Rscript code/learning/main.R 1 "L2_Logistic_Regression" "dx" 0

Nov 12 '19 16:11 BTopcuoglu

Previous changes regarding setting outcome and perm on the command line work when the user defines the outcome (e.g. "dx") as an argument. However, if they leave that argument empty (passed as NA e.g. Rscript code/learning/main.R 1 "L2_Logistic_Regression" 0), the rest of the pipeline breaks (the first column does not get selected).

I changed the order of the arguments in get_aucs function and then instead of NULL, I uses NA:

get_results <- function(dataset, models, split_number, perm=T, outcome=NA, hyperparameters=NULL)

Changed pipeline function to have NA as well.

pipeline <- function(dataset, model, split_number, outcome=NA, hyperparameters=NULL, perm=T)

Edited outcome=NULL argument from tuning_grid function because outcome infor should be decided in the previous functions already:

tuning_grid <- function(train_data, model, outcome, hyperparameters=NULL)

Nov 12 '19 16:11 BTopcuoglu

With these changes, it looks like we Fixed #6 and #7. Next step is checking if permutation works as we want in #8.

Nov 12 '19 17:11 BTopcuoglu

permutation_importance function doesn't work. Error:

Error in -sym(first_outcome) : invalid argument to unary operator
Calls: get_results ... <Anonymous> -> vars_select_eval -> map_if -> map -> .f

Caught the bug in line 87 and changed from first_outcome to outcome:

  non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(outcome)) %>%
    colnames()

Nov 12 '19 17:11 BTopcuoglu

Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one

Nov 12 '19 17:11 zenalapp

Nooo I was worried this wouldn't work but I wasn't doing permutation importance so I didn't catch it. I can look into another option if you don't know of one

I'm running it now with the change I've made (it was passing first_outcome which is not a column that can be selected but now uses sym("dx") which should work theoretically, I'll keep you posted.

Nov 12 '19 17:11 BTopcuoglu

But don't we want to have dx not hard-coded?

Nov 12 '19 17:11 zenalapp

But don't we want to have dx not hard-coded?

No I know, I have it as:

  non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(outcome)) %>%
    colnames()

Instead of what it was before:

non_correlated_otus <- full %>%
    select(-correlated_otus) %>%
    select(-sym(first_outcome)) %>%
    colnames()

Nov 12 '19 17:11 BTopcuoglu

Our previous attempt was unsuccessful - must be a bug with tidyverse. I made a new change:

  non_correlated_otus <- full %>%
    select(-correlated_otus)
  
  non_correlated_otus[,outcome] <- NULL
  
  non_correlated_otus <- non_correlated_otus %>%
    colnames()

Not the most beautiful code snippet I've written but it'll do:)

Nov 12 '19 20:11 BTopcuoglu

#8 We now made permutation importance optional but the data structure to run permutation is still hardcoded. We need to come back to that and fix it.

I'll now check if user-defined hyperparameters work #10 .

Nov 13 '19 18:11 BTopcuoglu

The NULL options for hyperparameters are currently specific to my CRC classification problem (except random forest, where we implement Pat's code: mtry <- floor(seq(1, n_features, length=6)) ). So those need to be adjusted and expanded in the future, but overall, I'm able to set up user-defined hyperparameters as a list and it works. Fixed #10.

Nov 13 '19 19:11 BTopcuoglu