SpiecEasi Add mb method compatibility with huge v1.3.3 and lower

In the process of addressing reviewer comments for a manuscript, I re-analyzed some networks using SpiecEasi v.1.1.2 and found that I got very different networks than I had initially estimated using an older version of SpiecEasi. The v.1.1.2 networks are all approximately 1/2 the size (i.e., contain ~half the number of edges) as networks reconstructed from an older version of SpiecEasi. Unfortunately, I didn't do a good job keeping track of which version of SpiecEasi I initially used to analyze my data. Based on the dates the files were created, I believe I had initially used SpiecEasi v.1.0 to reconstruct networks. I'm hoping to learn more about why I might be getting such different networks. Is the newer version of SpiecEasi doing a better job of eliminating false positive detection of edges?

I have confirmed that the input data is the same for both old and new networks. Below is an example comparing an old and recent network. You'll notice that the values for lambda.min.ratio and nlambda are different between the old and recent network -- this is because applying the same values used in the old analysis caused pulsar/stars to fail and return an empty matrix. This isn't surprising given new networks have many fewer edges than old networks.

Confirmation that input data is the same between old and recent networks NOTE:BM.F9.old and BM.F9 are phyloseq objects

table(taxa_sums(BM.F9.old) == taxa_sums(BM.F9))

TRUE 96

Comparison of old and recent networks

seBM.F9.old Pulsar-selected refit of sparseiCov Path length: 100 Graph dim: 96 Criterion: stars... sparsity 0.0464

seBM.F9 Pulsar-selected refit of sparseiCov Path length: 30 Graph dim: 96 Criterion: stars... sparsity 0.0171

Comparison of Stability between old and recent network

getStability(seBM.F9.old) #0.04869439 [1] 0.04847439

getStability(seBM.F9) [1] 0.04869439

Comparison of number of edges between old and recent networks

sum(getRefit(seBM.F9.old))/2 [1] 214

sum(getRefit(seBM.F9))/2 [1] 79

Comparison of Stability at different lambda values

seBM.F9.old$select$stars$summary [1] 0.00000000 0.00000000 0.03307088 0.03513404 0.03759228 0.03946386 0.04232351 [8] 0.04595018 0.04847439 0.05087789 0.05426000 0.05601544 0.05904456 0.06190246 [15] 0.06532912 0.06825123 0.07000035 0.07615404 0.08235368 0.08306386 0.08696982 [22] 0.08708877 0.09067895 0.09321298 0.09645544 0.09777123 0.09967088 0.10562035 [29] 0.10713193 0.11165088 0.11726772 0.12275860 0.12413930 0.12924035 0.14120246 [36] 0.14291509 0.14810632 0.15864211 0.16291649 0.16999333 0.17213193 0.18009509 [43] 0.18292877 0.19364105 0.20828491 0.21615860 0.21615860 0.22523228 0.23095123 [50] 0.23471649 0.24488281 0.25549193 0.26176246 0.27178070 0.29049509 0.29449474 [57] 0.29659789 0.29785754 0.30193193 0.31222491 0.33893719 0.34465439 0.35824246 [64] 0.35979404 0.36257123 0.36503193 0.37212772 0.37681789 0.38583544 0.40313088 [71] 0.40313088 0.40313088 0.41162421 0.41690386 0.42003930 0.42362421 0.42597825 [78] 0.43074982 0.43130035 0.43556667 0.43556667 0.44348351 0.44974281 0.45038632 [85] 0.45555439 0.45776772 0.46294702 0.46294702 0.47729298 0.47729298 0.47807544 [92] 0.48083088 0.48271930 0.48271930 0.49554246 0.49554246 0.49554246 0.49746000 [99] 0.49746000 0.50066070

seBM.F9$select$stars$summary [1] 0.00000000 0.02388772 0.03600035 0.04869439 0.05976807 0.07066456 0.08022140 [8] 0.08953509 0.10406912 0.10921439 0.15629930 0.15933298 0.16191368 0.16191368 [15] 0.16191368 0.16367018 0.16833158 0.17518912 0.18117228 0.19281965 0.20351439 [22] 0.21562105 0.22312632 0.23621088 0.24706035 0.25679895 0.26539930 0.27591684 [29] 0.28660070 0.29636982

Lambda values considered for each network

seBM.F9.old$lambda [1] 1.0000000000 0.9213303256 0.8488495689 0.7820708497 0.7205455906 0.6638605036 [7] 0.6116348139 0.5635177023 0.5191859481 0.4783417586 0.4407107682 0.4060401956 [13] 0.3740971456 0.3446670450 0.3175522008 0.2925704725 0.2695540487 0.2483483195 [19] 0.2288108380 0.2108103639 0.1942259812 0.1789462865 0.1648686404 0.1518984782 [25] 0.1399486744 0.1289389577 0.1187953719 0.1094497787 0.1008394002 0.0929063974 [31] 0.0855974814 0.0788635554 0.0726593852 0.0669432950 0.0616768878 0.0568247871 [37] 0.0523543996 0.0482356960 0.0444410095 0.0409448498 0.0377237318 0.0347560181 [43] 0.0320217735 0.0295026310 0.0271816686 0.0250432956 0.0230731477 0.0212579907 [49] 0.0195856315 0.0180448362 0.0166252548 0.0153173514 0.0141123404 0.0130021272 [55] 0.0119792540 0.0110368500 0.0101685846 0.0093686254 0.0086315987 0.0079525536 [61] 0.0073269288 0.0067505217 0.0062194604 0.0057301774 0.0052793863 0.0048640587 [67] 0.0044814047 0.0041288541 0.0038040385 0.0035047760 0.0032290564 0.0029750276 [73] 0.0027409832 0.0025253509 0.0023266824 0.0021436430 0.0019750033 0.0018196305 [79] 0.0016764807 0.0015445925 0.0014230799 0.0013111267 0.0012079808 0.0011129493 [85] 0.0010253940 0.0009447266 0.0008704052 0.0008019307 0.0007388431 0.0006807186 [91] 0.0006271667 0.0005778277 0.0005323701 0.0004904888 0.0004519022 0.0004163512 [97] 0.0003835970 0.0003534195 0.0003256161 0.0003000000

seBM.F9$lambda [1] 1.00000000 0.85316785 0.72789538 0.62101694 0.52983169 0.45203537 0.38566204 [8] 0.32903446 0.28072162 0.23950266 0.20433597 0.17433288 0.14873521 0.12689610 [15] 0.10826367 0.09236709 0.07880463 0.06723358 0.05736153 0.04893901 0.04175319 [22] 0.03562248 0.03039195 0.02592944 0.02212216 0.01887392 0.01610262 0.01373824 [29] 0.01172102 0.01000000

Dec 29 '21 19:12 kspeeriful

Thanks for the report. Do you know which version of R you were using in the initial analysis?

Have you tried downgrading to SpiecEasi v1.0 to test that version? If you do you might also want to check if you were using the huge package 1.3.0 there was a brief compatibility issue in a release (see issue #107 )

Also, have you tried matching nlambda and lambda.min.ratio between the old and new versions? Those seem to be inconsistent between the two runs...

Dec 30 '21 22:12 zdk123

I happened to find a conda environment that I had made to save these package versions! I was using R 3.5.1, huge 1.3.2, and SpiecEasi 1.0.7 (session Info below). Within this conda environment, I was able to recreate my old results exactly. When I use the same input data in R 4.1.2, huge 1.3.5, and SpiecEasi 1.1.2, my networks contain ~1/2 as many edges.

From Issue #107 that you linked above, I wasn't sure if the incompatibility was for huge 1.3.0 or all huge versions <1.3.4.

When I use the same values of nlambda and lambda.min.ratio from old networks in recent network reconstruction, pulsar/stars fail to complete. This may be linked to Issue #196, because I cannot increase nlambda beyond 50.

Session Info from initial network reconstruction

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS 10.16

Matrix products: default BLAS/LAPACK: /usr/local/anaconda3/envs/r_3.5.1/lib/R/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SpiecEasi_1.0.7 phyloseq_1.26.1 huge_1.3.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 pillar_1.4.3 compiler_3.5.1
[4] plyr_1.8.6 XVector_0.22.0 iterators_1.0.12
[7] tools_3.5.1 zlibbioc_1.28.0 tibble_3.0.1
[10] jsonlite_1.6.1 lifecycle_0.2.0 nlme_3.1-147
[13] rhdf5_2.26.2 gtable_0.3.0 lattice_0.20-41
[16] mgcv_1.8-31 pkgconfig_2.0.3 rlang_0.4.5
[19] Matrix_1.2-18 foreach_1.5.0 igraph_1.2.5
[22] parallel_3.5.1 pulsar_0.3.6 stringr_1.4.0
[25] cluster_2.1.0 vctrs_0.2.4 Biostrings_2.50.2
[28] S4Vectors_0.20.1 IRanges_2.16.0 multtest_2.38.0
[31] stats4_3.5.1 ade4_1.7-15 grid_3.5.1
[34] glue_1.4.0 Biobase_2.42.0 data.table_1.11.8
[37] R6_2.4.1 survival_3.1-12 VGAM_1.1-5
[40] reshape2_1.4.4 Rhdf5lib_1.4.2 ggplot2_3.3.0
[43] magrittr_1.5 splines_3.5.1 ellipsis_0.3.0
[46] scales_1.1.0 codetools_0.2-16 MASS_7.3-50
[49] BiocGenerics_0.28.0 biomformat_1.10.1 permute_0.9-5
[52] colorspace_1.4-1 ape_5.3 stringi_1.4.6
[55] munsell_0.5.0 vegan_2.5-6 crayon_1.3.4

Session Info from Recent Network Reconstruction

sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.6.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] igraph_1.2.10 SpiecEasi_1.1.2 viridis_0.6.2 viridisLite_0.4.0 [5] philr_1.20.0 vegan_2.5-7 lattice_0.20-45 permute_0.9-5
[9] ape_5.6 decontam_1.14.0 phyloseq_1.38.0 factoextra_1.0.7 [13] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[17] readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5
[21] tidyverse_1.3.1

loaded via a namespace (and not attached): [1] VGAM_1.1-5 colorspace_2.0-2 ggtree_3.2.1
[4] ggsignif_0.6.3 ellipsis_0.3.2 XVector_0.34.0
[7] fs_1.5.2 aplot_0.1.1 rstudioapi_0.13
[10] ggpubr_0.4.0 farver_2.1.0 ggrepel_0.9.1
[13] fansi_0.5.0 lubridate_1.8.0 xml2_1.3.3
[16] splines_4.1.2 codetools_0.2-18 knitr_1.37
[19] ade4_1.7-18 jsonlite_1.7.2 broom_0.7.10
[22] cluster_2.1.2 dbplyr_2.1.1 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 assertthat_0.2.1
[28] Matrix_1.4-0 fastmap_1.1.0 lazyeval_0.2.2
[31] cli_3.1.0 htmltools_0.5.2 tools_4.1.2
[34] gtable_0.3.0 glue_1.6.0 GenomeInfoDbData_1.2.7 [37] reshape2_1.4.4 fastmatch_1.1-3 Rcpp_1.0.7
[40] carData_3.0-4 Biobase_2.54.0 cellranger_1.1.0
[43] rhdf5filters_1.6.0 vctrs_0.3.8 Biostrings_2.62.0
[46] multtest_2.50.0 nlme_3.1-153 iterators_1.0.13
[49] xfun_0.29 rvest_1.0.2 lifecycle_1.0.1
[52] phangorn_2.8.1 rstatix_0.7.0 zlibbioc_1.40.0
[55] MASS_7.3-54 scales_1.1.1 hms_1.1.1
[58] parallel_4.1.2 biomformat_1.22.0 huge_1.3.5
[61] rhdf5_2.38.0 yaml_2.2.1 gridExtra_2.3
[64] ggfun_0.0.4 yulab.utils_0.0.4 stringi_1.7.6
[67] S4Vectors_0.32.3 foreach_1.5.1 tidytree_0.3.6
[70] BiocGenerics_0.40.0 shape_1.4.6 GenomeInfoDb_1.30.0
[73] rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-7
[76] evaluate_0.14 Rhdf5lib_1.16.0 treeio_1.18.1
[79] patchwork_1.1.1 labeling_0.4.2 tidyselect_1.1.1
[82] plyr_1.8.6 magrittr_2.0.1 R6_2.5.1
[85] IRanges_2.28.0 generics_0.1.1 DBI_1.1.2
[88] mgcv_1.8-38 pillar_1.6.4 haven_2.4.3
[91] withr_2.4.3 survival_3.2-13 abind_1.4-5
[94] RCurl_1.98-1.5 pulsar_0.3.7 modelr_0.1.8
[97] crayon_1.4.2 car_3.0-12 utf8_1.2.2
[100] tzdb_0.2.0 rmarkdown_2.11 grid_4.1.2
[103] readxl_1.3.1 data.table_1.14.2 reprex_2.0.1
[106] digest_0.6.29 gridGraphics_0.5-1 glmnet_4.1-3
[109] stats4_4.1.2 munsell_0.5.0 ggplotify_0.1.0
[112] quadprog_1.5-8

Dec 31 '21 22:12 kspeeriful

Thanks Kelly - very perplexing. I think the huge issue was fixed in v1.3.3. The other possibility I can think of is that the default sample.kind changed in R version 3.6+.

As with the other issue, if you can send me the data I could help debug.

Jan 01 '22 00:01 zdk123

Thank you for the quick response and helpful suggestions! I tried reverting to the <3.6 sample.kind, but it did not appear to replicate my old results.

In R 4.1.2 (SpiecEasi 1.1.2 and huge 1.3.5), I added "RNGkind(sample.kind="Rounding")" to the beginning of the script as suggested in the link you provided. Then I re-ran the above network and the network size was similar to default sample.kind in R 4.1.2, not to the network size I obtained when using R 3.5.1 (SpiecEasie 1.0.7, huge 1.3.2).

I've sent my data to your email! Let me know if I can provide any other information or if I should try some other things to assist in the debug process.

Thanks again!

Jan 06 '22 20:01 kspeeriful

An update: this does indeed look like a huge version issue, but I was mistaken in my original comment. huge v1.3.2 and v1.3.2 releases were inconsistent with older (v1.2 results) and at my request, this was "corrected" with huge v1.3.4 and up. I am sorry you just got a bit unlucky with the timing of package releases.

I can recover [close to] your original results on R 4.1 by running:

remove.packages(c("huge", "SpiecEasi")
install.packages('https://cran.r-project.org/src/contrib/Archive/huge/huge_1.3.2.tar.gz')
## this version works too
# install.packages('https://cran.r-project.org/src/contrib/Archive/huge/huge_1.3.3.tar.gz')
devtools::install_git("SpiecEasi", ref="hugemb-patch")
library(SpiecEasi)

RNGkind(sample.kind="Rounding")
pargs<- list(seed=30010, rep.num=50, ncores=8)
seBM.F9<- spiec.easi(BM.fragment.list[["BM.F9"]], method='mb', lambda.min.ratio=3e-4, nlambda=100, pulsar.params=pargs)
getStability(seBM.F9) #0.04750667
sum(getRefit(seBM.F9))/2 # 212

Note you should upgrade off the hugemb-patch branch again, I pushed some changes for huge backwards compatibility.

Jan 23 '22 20:01 zdk123

SpiecEasi SpiecEasi copied to clipboard

Add mb method compatibility with huge v1.3.3 and lower

SpiecEasi
SpiecEasi copied to clipboard