mixOmics
mixOmics copied to clipboard
`perf()` on `pls` unable to handle near zero variance features
🐞 Describe the bug:
When running the perf()
function on a pls
object (using any PLS mode), if there are features which have near zero variance (nzv), the following error is raised:
Error in Ypred[omit, , h] <- Y.hat[, , 1] :
number of items to replace is not a multiple of replacement length
This pub was raised by two users on the discourse forum: https://mixomics-users.discourse.group/t/pls-and-diablo-tuning/742/4
🔍 reprex results from reproducible example including sessioninfo():
library(mixOmics)
data("liver.toxicity")
# reducing number of features to reduce run time
X <- liver.toxicity$gene[, 1:1000]
Y <- liver.toxicity$clinic
# to reproduce error, we need to induce some features to have near zero variance
X[, c(1, 23, 62, 234, 789)] <- 0
pls.obg <- pls(Y, X, ncomp = 4)
#> Warning in cor(A[[k]], variates.A[[k]]): the standard deviation is zero
pls.perf.obj <- perf(pls.obg, validation = "Mfold", folds = 4,
progressBar = F,
nrepeat = 3)
#> Error in Ypred[omit, , h] <- Y.hat[, , 1]: number of items to replace is not a multiple of replacement length
Created on 2022-03-28 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 Patched (2021-11-16 r81220)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_Australia.1252
#> ctype English_Australia.1252
#> tz Australia/Sydney
#> date 2022-03-28
#> pandoc 2.14.2 @ C:/Users/Work/AppData/Local/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.3)
#> BiocParallel 1.28.3 2021-12-09 [1] Bioconductor
#> cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2)
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.2)
#> corpcor 1.6.10 2021-09-16 [1] CRAN (R 4.1.1)
#> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2)
#> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.3)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.2)
#> ellipse 0.4.2 2020-05-27 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2)
#> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.2)
#> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.2)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2)
#> ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.2)
#> ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.1.2)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.2)
#> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.2)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.2)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2)
#> igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.2)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lattice * 0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2)
#> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
#> MASS * 7.3-54 2021-05-03 [2] CRAN (R 4.1.2)
#> Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.2)
#> matrixStats 0.61.0 2021-09-17 [1] CRAN (R 4.1.2)
#> mixOmics * 6.18.1 2021-11-18 [1] Bioconductor (R 4.1.2)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.2)
#> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.2)
#> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.2)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.2)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.1)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.1)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.2)
#> rARPACK 0.11-0 2016-03-10 [1] CRAN (R 4.1.2)
#> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.1)
#> Rcpp 1.0.8.2 2022-03-11 [1] CRAN (R 4.1.2)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.2)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.2)
#> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.3)
#> rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.3)
#> RSpectra 0.16-0 2019-12-01 [1] CRAN (R 4.1.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.2)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.2)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.2)
#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.1.2)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.2)
#> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2)
#> xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.2)
#>
#> [1] C:/Users/Work/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2patched/library
#>
#> ------------------------------------------------------------------------------
🤔 Expected behavior:
The perf()
function is built to handle nzv features. Hence, it should ideally raise a warning to the fact that features have nzv, but should be able to proceed properly.
💡 Possible solution:
Error is occurring at line 542. Likely a result of Y.pred
containing columns for every feature whereas Y.hat
containing columns for all non-nzv features. Hence, adjusting this line to:
Ypred[omit, nzv.Y, h] = Y.hat[, , 1]
may solve the bug.