prospectr icon indicating copy to clipboard operation
prospectr copied to clipboard

`prospectr::duplex()`: Avoid error "1:nrow(d) : argument of length 0" when `2 * k` is `nrow(X)`.

Open philipp-baumann opened this issue 1 year ago • 4 comments

While working on a cLHS-DUPLEX sampling problem within clusters, I encountered an unlikely but unlucky case that threw an error (see next message). Because I first select spectra that are representative in terms of the distributions of spectral variables --- subject to reference analysis via conditioned latin hypercube sampling --- the next DUPLEX step decides for calibration/tuning or validation. Here, there is the situation of ntot_cluster_ref = 2 * k_duplex_fun. Meaning that half of the pool of spectra to be subjected to reference analysis at this stage go to calibration and half to validation roles, so that if run DUPLEX successfully in the end there is no spectra left.

However, for some edge cases, if there is only 1 possible sample left for "test" in the while (i < k) conditional, "Error in 1:nrow(d) : argument of length 0" occurs. This is because 1:nrow(d) is called after when d results in a 2L vector instead of a matrix expected, e.g. because of the implicit drop = TRUE default in [ for matrices/dfs. To be on the safe side, this PR both explicitly does not drop any dims when removing previously assigned samples via drop = FALSE and replaces 1:nrow(d) with seq_len(nrow(d)). The commits in this branch try to avoid any unintended side effects.

philipp-baumann avatar Oct 22 '22 21:10 philipp-baumann

this is a reprex before:

library("prospectr")
#> [34mprospectr version 0.2.6 -- [39m'chicago'
#> [34mcheck the github repository at: https://github.com/l-ramirez-lopez/prospectr/[39m
data(NIRsoil)

spec <- NIRsoil$spc
spec_pca <- stats::prcomp(spec, center = TRUE, scale. = FALSE)
# arbitrarily assuming 3 PCs would explain the desired variance threshold
# in total 6 samples for lab analysis, where 3 serve the purpose of calibration
# and 3 validation
scores <- as.data.frame(spec_pca$x)[1:6, 1:3]

duplex(X = scores, k = nrow(scores) / 2, metric = "mahal")
#> Error in 1:nrow(d): argument of length 0

Created on 2022-10-23 by the reprex package (v2.0.1)

philipp-baumann avatar Oct 22 '22 22:10 philipp-baumann

sorry 2eb7a17 is now a correct fix across rows.

philipp-baumann avatar Oct 22 '22 23:10 philipp-baumann

Now with this PR branch:

library("prospectr")
#> [34mprospectr version 0.2.6 -- [39m'chicago'
#> [34mcheck the github repository at: https://github.com/l-ramirez-lopez/prospectr/[39m
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Ubuntu 20.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2022-10-23
#>  pandoc   2.18 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.2)
#>  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.1.1)
#>  callr         3.7.2   2022-08-22 [1] CRAN (R 4.1.3)
#>  cli           3.4.0   2022-09-08 [1] CRAN (R 4.1.3)
#>  codetools     0.2-18  2020-11-04 [2] CRAN (R 4.1.3)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.1.3)
#>  devtools      2.4.2   2021-06-07 [1] CRAN (R 4.1.1)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate      0.16    2022-08-09 [1] CRAN (R 4.1.3)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.1.3)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.1)
#>  foreach       1.5.2   2022-02-02 [1] CRAN (R 4.1.2)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.2)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.1.3)
#>  iterators     1.0.14  2022-02-05 [1] CRAN (R 4.1.2)
#>  knitr         1.38    2022-03-25 [1] CRAN (R 4.1.3)
#>  lifecycle     1.0.2   2022-09-09 [1] CRAN (R 4.1.3)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.3)
#>  mathjaxr      1.4-0   2021-03-01 [1] CRAN (R 4.1.1)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.1.2)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.1.3)
#>  pkgbuild      1.3.1   2021-12-20 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.2)
#>  pkgload       1.3.0   2022-06-27 [1] CRAN (R 4.1.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.1.2)
#>  processx      3.7.0   2022-07-07 [1] CRAN (R 4.1.3)
#>  prospectr   * 0.2.6   2022-10-22 [1] Github (spectral-cockpit/prospectr@2eb7a17)
#>  ps            1.7.1   2022-06-18 [1] CRAN (R 4.1.3)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.3)
#>  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.2)
#>  rlang         1.0.5   2022-08-31 [1] CRAN (R 4.1.3)
#>  rmarkdown     2.13    2022-03-10 [1] CRAN (R 4.1.3)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.1.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.2)
#>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.1.1)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.1.3)
#>  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.1.3)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.1.3)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.3)
#>  xfun          0.30    2022-03-02 [1] CRAN (R 4.1.3)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.1.3)
#> 
#>  [1] /home/philipp/R/x86_64-pc-linux-gnu-library/4.1
#>  [2] /opt/R/4.1.3/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
data(NIRsoil)

spec <- NIRsoil$spc
spec_pca <- stats::prcomp(spec, center = TRUE, scale. = FALSE)
# arbitrarily assuming 3 PCs would explain the desired variance threshold
# in total 6 samples for lab analysis, where 3 serve the purpose of calibration
# and 3 validation
scores <- as.data.frame(spec_pca$x)[1:6, 1:3]

duplex(X = scores, k = nrow(scores) / 2, metric = "mahal")
#> $model
#> [1] 5 3 4
#> 
#> $test
#> [1] 2 1 6

Created on 2022-10-23 by the reprex package (v2.0.1)

philipp-baumann avatar Oct 22 '22 23:10 philipp-baumann

9bd0abf addresses the issue of two identical test samples that were returned when nrow(X) == 4 (because of https://github.com/l-ramirez-lopez/prospectr/blob/da9b8924ab1fa620f7e561fb3d239de23bd65171/R/duplex.R#L158). It is again not a very standard case, but in my opinion nice to cover. Somehow the arrayInd method does not work when dim(D) = c(2, 2), hence I thought returning early the two samples (exceptionally 1 if nrow(X) == 3, what is left) does the job. Maybe you find a more elegant way to program the last special case (promise no more in this PR branch ;))

Here a short reprex:

library("prospectr") # current version on github main branch; see session info
#> [34mprospectr version 0.2.6 -- [39m'chicago'
#> [34mcheck the github repository at: https://github.com/l-ramirez-lopez/prospectr/[39m
# below
data(NIRsoil)

spec <- NIRsoil$spc
spec_pca <- stats::prcomp(spec, center = TRUE, scale. = FALSE)
# 4 rows only
scores <- as.data.frame(spec_pca$x)[1:4, 1:3]

# returns twice the same row numbers/sample IDs for the test set
duplex(X = scores, k = nrow(scores) / 2, metric = "mahal")
#> $model
#> [1] 3 1
#> 
#> $test
#> [1] 4 4

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       Ubuntu 20.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2022-10-25
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date (UTC) lib source
#>    cli           3.4.1   2022-09-23 [1] CRAN (R 4.1.3)
#>    codetools     0.2-18  2020-11-04 [2] CRAN (R 4.1.3)
#>    digest        0.6.30  2022-10-18 [1] CRAN (R 4.1.3)
#>    evaluate      0.17    2022-10-07 [1] CRAN (R 4.1.3)
#>    fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.3)
#>    foreach       1.5.2   2022-02-02 [1] CRAN (R 4.1.3)
#>    fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.3)
#>    glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.3)
#>    highr         0.9     2021-04-16 [1] CRAN (R 4.1.3)
#>    htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.1.3)
#>    iterators     1.0.14  2022-02-05 [1] CRAN (R 4.1.3)
#>    knitr         1.40    2022-08-24 [1] CRAN (R 4.1.3)
#>    lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.1.3)
#>    magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.3)
#>    mathjaxr      1.6-0   2022-02-28 [1] CRAN (R 4.1.3)
#>    prospectr   * 0.2.6   2022-10-25 [1] Github (l-ramirez-lopez/prospectr@da9b892)
#>    Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.3)
#>    reprex        2.0.2   2022-08-17 [1] CRAN (R 4.1.3)
#>    rlang         1.0.6   2022-09-24 [1] CRAN (R 4.1.3)
#>    rmarkdown     2.17    2022-10-07 [1] CRAN (R 4.1.3)
#>  P sessioninfo   1.2.2   2021-12-06 [?] CRAN (R 4.1.3)
#>    stringi       1.7.8   2022-07-11 [1] CRAN (R 4.1.3)
#>    stringr       1.4.1   2022-08-20 [1] CRAN (R 4.1.3)
#>    withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.3)
#>    xfun          0.34    2022-10-18 [1] CRAN (R 4.1.3)
#>    yaml          2.3.6   2022-10-18 [1] CRAN (R 4.1.3)
#> 
#>  [1] /home/philipp/git/spectral-cockpit/reprex-prospectr-duplex-no-drop-dims/renv/profiles/origin/renv/library/R-4.1/x86_64-pc-linux-gnu
#>  [2] /opt/R/4.1.3/lib/R/library
#> 
#>  P ── Loaded and on-disk path mismatch.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2022-10-25 with reprex v2.0.2

philipp-baumann avatar Oct 25 '22 13:10 philipp-baumann

Thank you @philipp-baumann

l-ramirez-lopez avatar Oct 31 '22 19:10 l-ramirez-lopez