recipeselectors
recipeselectors copied to clipboard
step_select_boruta and step_select_mrmr need method for internally handling NAs
step_select_boruta and step_select_mrmr cannot handle data with missing/NA values. This requires the user to remove or impute NAs in a recipe step prior to the feature selection step in order to use these feature selection steps, which might not be desirable. It would be handy if step_select_boruta and step_select_mrmr could internally omit NAs which would allow the user to preserve them in the training data.
Is there a reason why adding a step_impute_
step before the filter-based step is undesirable? I guess it you are specifically wanting your model to handle the missing values, e.g., if using XGBoost, then you might not want NAs imputed by another method? However, overall, most steps in the 'recipes' package do not handle NAs, and given the diversity of imputation methods, I'm not sure if just adding approaches on a recipe step basis is the way to go, given the composable style of tidymodels.
You could add a 'missing' column first using step_indicate_na
, impute the NAs and them potentially add them back in if you wanted the model at the end of the pipeline to handle them using its own method.
Yes, it would be for development of a model capable of handling the NAs, e.g. XGBoost.
From: Steven Pawley @.> Sent: Friday, November 11, 2022 8:31 AM To: stevenpawley/recipeselectors @.> Cc: Ransom, Katherine M @.>; Author @.> Subject: [EXTERNAL] Re: [stevenpawley/recipeselectors] step_select_boruta and step_select_mrmr need method for internally handling NAs (Issue #11)
This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.
Is there a reason why adding a step_impute_ step before the filter-based step is undesirable? I guess it you are specifically wanting your model to handle the missing values, e.g., if using XGBoost, then you might not want NAs imputed by another method? However, overall, most steps in the 'recipes' package do not handle NAs, and given the diversity of imputation methods, I'm not sure if just adding approaches on a recipe step basis is the way to go, given the composable style of tidymodels.
You could add a 'missing' column first using step_indicate_na, impute the NAs and them potentially add them back in if you wanted the model at the end of the pipeline to handle them using its own method.
Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fstevenpawley%2Frecipeselectors%2Fissues%2F11%23issuecomment-1311921965&data=05%7C01%7Ckransom%40usgs.gov%7Ca4c19684765c44cadc9008dac4021b08%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638037810566142777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pTenIGLn%2FtzmWvLng5pZr8YaRNNHL%2BhnfllYJ4y5Jro%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEKRGZSZKVVJ6KNU5UZP7GDWHZYDXANCNFSM6AAAAAARXTYE4A&data=05%7C01%7Ckransom%40usgs.gov%7Ca4c19684765c44cadc9008dac4021b08%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638037810566142777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RfUutcqJtuT0R%2Bwb9gDXSekPoy%2FPtIMfA%2BB6LtugqYE%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>