Added option to impute missing values during training
Issues
- Resolves #301 .
Change(s) made
- In preprocess.R, we added an option called impute_in_preprocessing that defaults to True. When it is true, it calls the impute function, which performs the imputation on the data. When it is set to false, this step gets skipped.
- We created a file called impute.R where we moved code from preprocess.R that would be shared between files. We created two functions: impute(), which takes in a dataset and performs the imputation, and prep_data(), which gets the dataset into the proper format to perform imputation.
- In run_ml, we added an option called impute_in_training, which defaults to False. When it is true, it calls prep_data() and performs the imputation step following the train/test data split. When it is false, it does not modify the input dataset.
- We added test cases to test-run_ml.R and in test-preprocess.R to ensure that our functionality worked as expected.
Checklist
(~Strikethrough~ any points that are not applicable.)
- [X] Write unit tests for any new functionality or bug fixes.
- Update docs if there are any API changes:
- [ ] roxygen comments
- [ ] vignettes
- [X] Update
NEWS.mdif this includes any user-facing changes. - [ ] The check workflow succeeds on your most recent commit. This is always required before the PR can be merged.
These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.
Tip: You can run devtools::check() to see if your local version of the package passes the checks successfully, even before you commit and push your changes.
https://r-pkgs.org/whole-game.html#check
These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.
Would you like us to include information about how to use the option we added to the run_ml function in the 'preprocess' vignette, or should this be included elsewhere?
These are API changes, so we'll need to document how to use them in the 'preprocess' vignette and roxygen comments.
Would you like us to include information about how to use it in the run_ml function in the 'preprocess' vignette, or should this be included elsewhere?
I would discuss it in a new subheading the preprocess vignette, probably nested under the heading "Replace missing continuous data with the median value of that feature" (also maybe rephrase this heading to make it more succinct). Show how to call both preprocess_data and run_ml in one code chunk.