mixOmics
mixOmics copied to clipboard
Fix for Issue #192
This PR is in response to the bug raised by @Ning-L in #192. This PR replaces #193 for consistency's sake.
This commit employs the fix suggested by @Ning-L, such that within the predict()
function, the similarity of the training and testing variables is tested. If they have the same values but a different order, then they are sorted and algorithm proceeds as normal. The same stop()
call is raised if they do not contain the exact same set of variables.
Also added a test that ensures that even if one or more of the inputted test dataframes has their variables in a different order to the test set, the same predictions should be made on the novel samples.
@Max-Bladen in such a scenario, it is best to be strict and ask users to ensure variables are consistent across the model and prediction data. Since if we don't, we will end up with a spaghetti codebase handling every possible scenario. Basically, facing a user request that will add complexity to the codebase, we need to first ask: "is it reasonable to ask the user to handle it on their side?"
Also, in introducing changes, try to break it down into minimal and easy-to-follow sections (long and readable is better than short and unreadable). For instance, here:
define the function separately and document what it does, then use it across blocks. But I wouldn't worry about adding this specific change. Instead, I'd update the docs to inform users that we expect the same set of variables in the same order. It also gives the a chance to better inspect their data.
@aljabadi good suggestions. I can make a minor adjustment such that it informs users that the set of features is different or in a different order. Also your advice regarding the parsibility of our code is noted. I'll make sure to keep this in mind going forward
Latest commit updates the predict()
function to provide a slightly more specific error message. There are now different error messages for when the sets of features have differing values or just differing orders. Also now tells users which dataframe has the mismatch. Used a form of code which is a bit more readable too. The reprex below shows the three possible scenarios:
- perfect match between
X
andnewdata
, no error - mismatching orders between
X
andnewdata
features, error - mismatching sets of features between
X
andnewdata
, error
Follow up commit will correct the test cases for these scenarios
data(breast.TCGA) # load in the data
# extract data
X.train = list(mirna = breast.TCGA$data.train$mirna,
mrna = breast.TCGA$data.train$mrna)
X.test = list(mirna = breast.TCGA$data.test$mirna,
mrna = breast.TCGA$data.test$mrna)
Y.train = breast.TCGA$data.train$subtype
# use optimal values from the case study on mixOmics.org
optimal.ncomp = 2
optimal.keepX = list(mirna = c(10,5),
mrna = c(26, 16))
# set design matrix
design = matrix(0.1, ncol = length(X.train), nrow = length(X.train),
dimnames = list(names(X.train), names(X.train)))
diag(design) = 0
# generate model
final.diablo.model = block.splsda(X = X.train, Y = Y.train, ncomp = optimal.ncomp, # set the optimised DIABLO model
keepX = optimal.keepX, design = design)
#> Design matrix has changed to include Y; each block will be
#> linked to Y.
# create new test data with one dataframe being reordered
new.var.order = sample(1:dim(X.test$mrna)[2])
X.test.reorder <- X.test
X.test.reorder$mrna <- X.test.reorder$mrna[, new.var.order]
X.test.new.feat <- X.test
colnames(X.test.new.feat$mrna)[1] <- "random.feature.name"
# should raise no error
preds.normal <- predict(final.diablo.model, newdata = X.test)
# should raise error about mismatching ORDERS of features
preds.reordered <- predict(final.diablo.model, newdata = X.test.reorder)
#> Error in predict.block.spls(final.diablo.model, newdata = X.test.reorder):
#> The order of features in 'object$X$mrna' is different to 'newdata$mrna'.
#> Please ensure that you adjust these to the same order
# should raise error about mismatching SETS of features
preds.new.feature <- predict(final.diablo.model, newdata = X.test.new.feat)
#> Error in predict.block.spls(final.diablo.model, newdata = X.test.new.feat):
#> The set of features in 'object$X$mrna' is different to 'newdata$mrna'.
#> Please ensure they have the same features.
Created on 2022-04-25 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 Patched (2021-11-16 r81220)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_Australia.1252
#> ctype English_Australia.1252
#> tz Australia/Sydney
#> date 2022-04-25
#> pandoc 2.14.2 @ C:/Users/Work/AppData/Local/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> ! package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.3)
#> BiocParallel 1.28.3 2021-12-09 [1] Bioconductor
#> brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.2)
#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.2)
#> callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.2)
#> cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2)
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
#> corpcor 1.6.10 2021-09-16 [1] CRAN (R 4.1.1)
#> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2)
#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.3)
#> desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.2)
#> devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.2)
#> ellipse 0.4.2 2020-05-27 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
#> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
#> ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.2)
#> ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.2)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
#> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.2)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.2)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2)
#> igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.2)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lattice * 0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2)
#> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
#> MASS * 7.3-54 2021-05-03 [2] CRAN (R 4.1.2)
#> Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.2)
#> matrixStats 0.61.0 2021-09-17 [1] CRAN (R 4.1.2)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2)
#> VP mixOmics * 6.19.3 2021-11-18 [?] Bioconductor (R 4.1.2) (on disk 6.18.1)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.2)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
#> pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
#> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2)
#> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.2)
#> processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.2)
#> ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.2)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.2)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.1)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.1)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.2)
#> rARPACK 0.11-0 2016-03-10 [1] CRAN (R 4.1.2)
#> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.1)
#> Rcpp 1.0.8.2 2022-03-11 [1] CRAN (R 4.1.2)
#> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.2)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.2)
#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3)
#> rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.3)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.2)
#> RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.1.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.2)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.2)
#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.1.2)
#> testthat * 3.1.2 2022-01-20 [1] CRAN (R 4.1.2)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
#> usethis * 2.1.5 2021-12-09 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
#> xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
#>
#> [1] C:/Users/Work/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2patched/library
#>
#> V -- Loaded and on-disk version mismatch.
#> P -- Loaded and on-disk path mismatch.
#>
#> ------------------------------------------------------------------------------