mixOmics icon indicating copy to clipboard operation
mixOmics copied to clipboard

No parallelization in tune.spls

Open EmileFaure opened this issue 3 years ago • 2 comments
trafficstars


🐞 Describe the bug:

tune.spls is not using multicore processing even with BPPARAM set correctly.


🔍 reprex results from reproducible example including sessioninfo():

Monitoring of this code from the package vignette:

param = MulticoreParam(workers=5)
bpstart(param)
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
set.seed(42)
tune.res = tune.spls( X, Y, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      test.keepY = c(3, 6, 8), measure = "cor",
                      folds = 5, nrepeat = 3, progressBar = TRUE, BPPARAM = param)

--> Only one CPU is used

tune.res = tune.spca( X, ncomp = 3,
                      test.keepX = c(5, 10, 15),
                      folds = 5, nrepeat = 3, BPPARAM = param)

--> Multiple CPUs are used in parallel.

Session info:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mixOmics_6.18.1     ggplot2_3.3.5       lattice_0.20-45     MASS_7.3-57        
[5] BiocParallel_1.28.3

loaded via a namespace (and not attached):
 [1] ggrepel_0.9.1        Rcpp_1.0.8.3         lubridate_1.8.0      tidyr_1.2.0         
 [5] corpcor_1.6.10       listenv_0.8.0        class_7.3-20         snow_0.4-4          
 [9] assertthat_0.2.1     digest_0.6.29        ipred_0.9-12         foreach_1.5.2       
[13] utf8_1.2.2           RSpectra_0.16-1      parallelly_1.31.1    R6_2.5.1            
[17] plyr_1.8.7           hardhat_0.2.0        stats4_4.1.2         ellipse_0.4.2       
[21] pillar_1.7.0         rlang_1.0.2          caret_6.0-92         rstudioapi_0.13     
[25] data.table_1.14.2    rpart_4.1.16         Matrix_1.4-1         rARPACK_0.11-0      
[29] splines_4.1.2        gower_1.0.0          stringr_1.4.0        igraph_1.3.1        
[33] munsell_0.5.0        tinytex_0.38         compiler_4.1.2       xfun_0.30           
[37] pkgconfig_2.0.3      globals_0.14.0       nnet_7.3-17          tidyselect_1.1.2    
[41] gridExtra_2.3        tibble_3.1.6         prodlim_2019.11.13   matrixStats_0.62.0  
[45] codetools_0.2-18     fansi_1.0.3          future_1.25.0        crayon_1.5.1        
[49] dplyr_1.0.9          withr_2.5.0          recipes_0.2.0        ModelMetrics_1.2.2.2
[53] grid_4.1.2           nlme_3.1-157         gtable_0.3.0         lifecycle_1.0.1     
[57] DBI_1.1.2            magrittr_2.0.3       pROC_1.18.0          scales_1.2.0        
[61] future.apply_1.9.0   cli_3.3.0            stringi_1.7.6        reshape2_1.4.4      
[65] timeDate_3043.102    ellipsis_0.3.2       generics_0.1.2       vctrs_0.4.1         
[69] lava_1.6.10          RColorBrewer_1.1-3   iterators_1.0.14     tools_4.1.2         
[73] glue_1.6.2           purrr_0.3.4          parallel_4.1.2       survival_3.3-1      
[77] colorspace_2.0-3     BiocManager_1.30.17 

🤔 Expected behavior:

tune.spls should be parallelized over the workers registered in the BiocParallel environment.


💡 Possible solution:

In source code of tune.spca, the call of iterations over the keeping of X is made through bplapply: test.keepX.cors <- bplapply(X=all.keepX, FUN=iter_keepX, BPPARAM = BPPARAM)

In the one of tune.spls, there seem to be no use of any of the basic functions of BiocParallel. It looks to me as the presence of the BiocParallel backend is only playing a role on whether the progress bar is printed or not. If I'm wrong, do you have any idea of why this could happen on my system ? Thank you very much !

EmileFaure avatar May 03 '22 10:05 EmileFaure

Thank you for your post @EmileFaure. What you've noted is totally correct. After I spent a bit of time exploring the source code, I've realised that this issue a bit larger than just a lack of parallelisation in tune.spls(). If you're interested, feel free to read the information in Issue #216.

As far as it should concern you, I will attempt to rectify the lack of any parallelisation in tune.spls() prior to any other fixes/enhancements.

Max-Bladen avatar May 11 '22 01:05 Max-Bladen

G'day @EmileFaure. Thought I'd let you know that I've implemented a fix for this issue and that it is accessible on branch issue-214. Please let me know if there are any issues you run into

Max-Bladen avatar May 17 '22 22:05 Max-Bladen