SpiecEasi
SpiecEasi copied to clipboard
Add mb method compatibility with huge v1.3.3 and lower
In the process of addressing reviewer comments for a manuscript, I re-analyzed some networks using SpiecEasi v.1.1.2 and found that I got very different networks than I had initially estimated using an older version of SpiecEasi. The v.1.1.2 networks are all approximately 1/2 the size (i.e., contain ~half the number of edges) as networks reconstructed from an older version of SpiecEasi. Unfortunately, I didn't do a good job keeping track of which version of SpiecEasi I initially used to analyze my data. Based on the dates the files were created, I believe I had initially used SpiecEasi v.1.0 to reconstruct networks. I'm hoping to learn more about why I might be getting such different networks. Is the newer version of SpiecEasi doing a better job of eliminating false positive detection of edges?
I have confirmed that the input data is the same for both old and new networks. Below is an example comparing an old and recent network. You'll notice that the values for lambda.min.ratio and nlambda are different between the old and recent network -- this is because applying the same values used in the old analysis caused pulsar/stars to fail and return an empty matrix. This isn't surprising given new networks have many fewer edges than old networks.
Confirmation that input data is the same between old and recent networks NOTE:BM.F9.old and BM.F9 are phyloseq objects
table(taxa_sums(BM.F9.old) == taxa_sums(BM.F9))
TRUE 96
Comparison of old and recent networks
seBM.F9.old Pulsar-selected refit of sparseiCov Path length: 100 Graph dim: 96 Criterion: stars... sparsity 0.0464
seBM.F9 Pulsar-selected refit of sparseiCov Path length: 30 Graph dim: 96 Criterion: stars... sparsity 0.0171
Comparison of Stability between old and recent network
getStability(seBM.F9.old) #0.04869439 [1] 0.04847439
getStability(seBM.F9) [1] 0.04869439
Comparison of number of edges between old and recent networks
sum(getRefit(seBM.F9.old))/2 [1] 214
sum(getRefit(seBM.F9))/2 [1] 79
Comparison of Stability at different lambda values
seBM.F9.old$select$stars$summary [1] 0.00000000 0.00000000 0.03307088 0.03513404 0.03759228 0.03946386 0.04232351 [8] 0.04595018 0.04847439 0.05087789 0.05426000 0.05601544 0.05904456 0.06190246 [15] 0.06532912 0.06825123 0.07000035 0.07615404 0.08235368 0.08306386 0.08696982 [22] 0.08708877 0.09067895 0.09321298 0.09645544 0.09777123 0.09967088 0.10562035 [29] 0.10713193 0.11165088 0.11726772 0.12275860 0.12413930 0.12924035 0.14120246 [36] 0.14291509 0.14810632 0.15864211 0.16291649 0.16999333 0.17213193 0.18009509 [43] 0.18292877 0.19364105 0.20828491 0.21615860 0.21615860 0.22523228 0.23095123 [50] 0.23471649 0.24488281 0.25549193 0.26176246 0.27178070 0.29049509 0.29449474 [57] 0.29659789 0.29785754 0.30193193 0.31222491 0.33893719 0.34465439 0.35824246 [64] 0.35979404 0.36257123 0.36503193 0.37212772 0.37681789 0.38583544 0.40313088 [71] 0.40313088 0.40313088 0.41162421 0.41690386 0.42003930 0.42362421 0.42597825 [78] 0.43074982 0.43130035 0.43556667 0.43556667 0.44348351 0.44974281 0.45038632 [85] 0.45555439 0.45776772 0.46294702 0.46294702 0.47729298 0.47729298 0.47807544 [92] 0.48083088 0.48271930 0.48271930 0.49554246 0.49554246 0.49554246 0.49746000 [99] 0.49746000 0.50066070
seBM.F9$select$stars$summary [1] 0.00000000 0.02388772 0.03600035 0.04869439 0.05976807 0.07066456 0.08022140 [8] 0.08953509 0.10406912 0.10921439 0.15629930 0.15933298 0.16191368 0.16191368 [15] 0.16191368 0.16367018 0.16833158 0.17518912 0.18117228 0.19281965 0.20351439 [22] 0.21562105 0.22312632 0.23621088 0.24706035 0.25679895 0.26539930 0.27591684 [29] 0.28660070 0.29636982
Lambda values considered for each network
seBM.F9.old$lambda [1] 1.0000000000 0.9213303256 0.8488495689 0.7820708497 0.7205455906 0.6638605036 [7] 0.6116348139 0.5635177023 0.5191859481 0.4783417586 0.4407107682 0.4060401956 [13] 0.3740971456 0.3446670450 0.3175522008 0.2925704725 0.2695540487 0.2483483195 [19] 0.2288108380 0.2108103639 0.1942259812 0.1789462865 0.1648686404 0.1518984782 [25] 0.1399486744 0.1289389577 0.1187953719 0.1094497787 0.1008394002 0.0929063974 [31] 0.0855974814 0.0788635554 0.0726593852 0.0669432950 0.0616768878 0.0568247871 [37] 0.0523543996 0.0482356960 0.0444410095 0.0409448498 0.0377237318 0.0347560181 [43] 0.0320217735 0.0295026310 0.0271816686 0.0250432956 0.0230731477 0.0212579907 [49] 0.0195856315 0.0180448362 0.0166252548 0.0153173514 0.0141123404 0.0130021272 [55] 0.0119792540 0.0110368500 0.0101685846 0.0093686254 0.0086315987 0.0079525536 [61] 0.0073269288 0.0067505217 0.0062194604 0.0057301774 0.0052793863 0.0048640587 [67] 0.0044814047 0.0041288541 0.0038040385 0.0035047760 0.0032290564 0.0029750276 [73] 0.0027409832 0.0025253509 0.0023266824 0.0021436430 0.0019750033 0.0018196305 [79] 0.0016764807 0.0015445925 0.0014230799 0.0013111267 0.0012079808 0.0011129493 [85] 0.0010253940 0.0009447266 0.0008704052 0.0008019307 0.0007388431 0.0006807186 [91] 0.0006271667 0.0005778277 0.0005323701 0.0004904888 0.0004519022 0.0004163512 [97] 0.0003835970 0.0003534195 0.0003256161 0.0003000000
seBM.F9$lambda [1] 1.00000000 0.85316785 0.72789538 0.62101694 0.52983169 0.45203537 0.38566204 [8] 0.32903446 0.28072162 0.23950266 0.20433597 0.17433288 0.14873521 0.12689610 [15] 0.10826367 0.09236709 0.07880463 0.06723358 0.05736153 0.04893901 0.04175319 [22] 0.03562248 0.03039195 0.02592944 0.02212216 0.01887392 0.01610262 0.01373824 [29] 0.01172102 0.01000000
Thanks for the report. Do you know which version of R you were using in the initial analysis?
Have you tried downgrading to SpiecEasi v1.0 to test that version? If you do you might also want to check if you were using the huge package 1.3.0 there was a brief compatibility issue in a release (see issue #107 )
Also, have you tried matching nlambda
and lambda.min.ratio
between the old and new versions? Those seem to be inconsistent between the two runs...
I happened to find a conda environment that I had made to save these package versions! I was using R 3.5.1, huge 1.3.2, and SpiecEasi 1.0.7 (session Info below). Within this conda environment, I was able to recreate my old results exactly. When I use the same input data in R 4.1.2, huge 1.3.5, and SpiecEasi 1.1.2, my networks contain ~1/2 as many edges.
From Issue #107 that you linked above, I wasn't sure if the incompatibility was for huge 1.3.0 or all huge versions <1.3.4.
When I use the same values of nlambda and lambda.min.ratio from old networks in recent network reconstruction, pulsar/stars fail to complete. This may be linked to Issue #196, because I cannot increase nlambda beyond 50.
Session Info from initial network reconstruction
sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS 10.16
Matrix products: default BLAS/LAPACK: /usr/local/anaconda3/envs/r_3.5.1/lib/R/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] SpiecEasi_1.0.7 phyloseq_1.26.1 huge_1.3.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 pillar_1.4.3 compiler_3.5.1
[4] plyr_1.8.6 XVector_0.22.0 iterators_1.0.12
[7] tools_3.5.1 zlibbioc_1.28.0 tibble_3.0.1
[10] jsonlite_1.6.1 lifecycle_0.2.0 nlme_3.1-147
[13] rhdf5_2.26.2 gtable_0.3.0 lattice_0.20-41
[16] mgcv_1.8-31 pkgconfig_2.0.3 rlang_0.4.5
[19] Matrix_1.2-18 foreach_1.5.0 igraph_1.2.5
[22] parallel_3.5.1 pulsar_0.3.6 stringr_1.4.0
[25] cluster_2.1.0 vctrs_0.2.4 Biostrings_2.50.2
[28] S4Vectors_0.20.1 IRanges_2.16.0 multtest_2.38.0
[31] stats4_3.5.1 ade4_1.7-15 grid_3.5.1
[34] glue_1.4.0 Biobase_2.42.0 data.table_1.11.8
[37] R6_2.4.1 survival_3.1-12 VGAM_1.1-5
[40] reshape2_1.4.4 Rhdf5lib_1.4.2 ggplot2_3.3.0
[43] magrittr_1.5 splines_3.5.1 ellipsis_0.3.0
[46] scales_1.1.0 codetools_0.2-16 MASS_7.3-50
[49] BiocGenerics_0.28.0 biomformat_1.10.1 permute_0.9-5
[52] colorspace_1.4-1 ape_5.3 stringi_1.4.6
[55] munsell_0.5.0 vegan_2.5-6 crayon_1.3.4
Session Info from Recent Network Reconstruction
sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.6.1
Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] igraph_1.2.10 SpiecEasi_1.1.2 viridis_0.6.2 viridisLite_0.4.0
[5] philr_1.20.0 vegan_2.5-7 lattice_0.20-45 permute_0.9-5
[9] ape_5.6 decontam_1.14.0 phyloseq_1.38.0 factoextra_1.0.7
[13] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[17] readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5
[21] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] VGAM_1.1-5 colorspace_2.0-2 ggtree_3.2.1
[4] ggsignif_0.6.3 ellipsis_0.3.2 XVector_0.34.0
[7] fs_1.5.2 aplot_0.1.1 rstudioapi_0.13
[10] ggpubr_0.4.0 farver_2.1.0 ggrepel_0.9.1
[13] fansi_0.5.0 lubridate_1.8.0 xml2_1.3.3
[16] splines_4.1.2 codetools_0.2-18 knitr_1.37
[19] ade4_1.7-18 jsonlite_1.7.2 broom_0.7.10
[22] cluster_2.1.2 dbplyr_2.1.1 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 assertthat_0.2.1
[28] Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2
[31] cli_3.1.0 htmltools_0.5.2 tools_4.1.2
[34] gtable_0.3.0 glue_1.6.0 GenomeInfoDbData_1.2.7
[37] reshape2_1.4.4 fastmatch_1.1-3 Rcpp_1.0.7
[40] carData_3.0-4 Biobase_2.54.0 cellranger_1.1.0
[43] rhdf5filters_1.6.0 vctrs_0.3.8 Biostrings_2.62.0
[46] multtest_2.50.0 nlme_3.1-153 iterators_1.0.13
[49] xfun_0.29 rvest_1.0.2 lifecycle_1.0.1
[52] phangorn_2.8.1 rstatix_0.7.0 zlibbioc_1.40.0
[55] MASS_7.3-54 scales_1.1.1 hms_1.1.1
[58] parallel_4.1.2 biomformat_1.22.0 huge_1.3.5
[61] rhdf5_2.38.0 yaml_2.2.1 gridExtra_2.3
[64] ggfun_0.0.4 yulab.utils_0.0.4 stringi_1.7.6
[67] S4Vectors_0.32.3 foreach_1.5.1 tidytree_0.3.6
[70] BiocGenerics_0.40.0 shape_1.4.6 GenomeInfoDb_1.30.0
[73] rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-7
[76] evaluate_0.14 Rhdf5lib_1.16.0 treeio_1.18.1
[79] patchwork_1.1.1 labeling_0.4.2 tidyselect_1.1.1
[82] plyr_1.8.6 magrittr_2.0.1 R6_2.5.1
[85] IRanges_2.28.0 generics_0.1.1 DBI_1.1.2
[88] mgcv_1.8-38 pillar_1.6.4 haven_2.4.3
[91] withr_2.4.3 survival_3.2-13 abind_1.4-5
[94] RCurl_1.98-1.5 pulsar_0.3.7 modelr_0.1.8
[97] crayon_1.4.2 car_3.0-12 utf8_1.2.2
[100] tzdb_0.2.0 rmarkdown_2.11 grid_4.1.2
[103] readxl_1.3.1 data.table_1.14.2 reprex_2.0.1
[106] digest_0.6.29 gridGraphics_0.5-1 glmnet_4.1-3
[109] stats4_4.1.2 munsell_0.5.0 ggplotify_0.1.0
[112] quadprog_1.5-8
Thanks Kelly - very perplexing. I think the huge issue was fixed in v1.3.3. The other possibility I can think of is that the default sample.kind changed in R version 3.6+.
As with the other issue, if you can send me the data I could help debug.
Thank you for the quick response and helpful suggestions! I tried reverting to the <3.6 sample.kind, but it did not appear to replicate my old results.
In R 4.1.2 (SpiecEasi 1.1.2 and huge 1.3.5), I added "RNGkind(sample.kind="Rounding")" to the beginning of the script as suggested in the link you provided. Then I re-ran the above network and the network size was similar to default sample.kind in R 4.1.2, not to the network size I obtained when using R 3.5.1 (SpiecEasie 1.0.7, huge 1.3.2).
I've sent my data to your email! Let me know if I can provide any other information or if I should try some other things to assist in the debug process.
Thanks again!
An update: this does indeed look like a huge version issue, but I was mistaken in my original comment. huge v1.3.2 and v1.3.2 releases were inconsistent with older (v1.2 results) and at my request, this was "corrected" with huge v1.3.4 and up. I am sorry you just got a bit unlucky with the timing of package releases.
I can recover [close to] your original results on R 4.1 by running:
remove.packages(c("huge", "SpiecEasi")
install.packages('https://cran.r-project.org/src/contrib/Archive/huge/huge_1.3.2.tar.gz')
## this version works too
# install.packages('https://cran.r-project.org/src/contrib/Archive/huge/huge_1.3.3.tar.gz')
devtools::install_git("SpiecEasi", ref="hugemb-patch")
library(SpiecEasi)
RNGkind(sample.kind="Rounding")
pargs<- list(seed=30010, rep.num=50, ncores=8)
seBM.F9<- spiec.easi(BM.fragment.list[["BM.F9"]], method='mb', lambda.min.ratio=3e-4, nlambda=100, pulsar.params=pargs)
getStability(seBM.F9) #0.04750667
sum(getRefit(seBM.F9))/2 # 212
Note you should upgrade off the hugemb-patch
branch again, I pushed some changes for huge
backwards compatibility.