Printing p values and fdr-adjusted p values (learn_network)
Hi, I would need to obtain the p values of each edge for evaluating the effect of preprocessing the asv table. however I didn't manage to access p values or fdr adjusted p values in the output of learn_network. Is there any way to obtain this?
Hi Anna,
Unfortunately returning p-values is currently not supported and I'm not yet decided if it should be. The algorithmic framework used by FlashWeave only utilizes the classic pairwise hypothesis test + fdr-adjustment workflow for the univariate case (max_k=0). The conditioning search operates differently (more algorithmically and heuristically), and while shown to have good FDR performance in practice (see Aliferis 2010, referenced in the FlashWeave publication), there are many factors influencing which tests are actually performed or can be skipped (e.g. employed heuristics). In addition, there are internal optimizations to keep the huge numbers of performed tests manageable, affecting p-value accuracy and comparability in some cases.
When interpreting FlashWeave edge weights, I'd think less statistically and more in terms of scores being produced by a highly optimized algorithm, if that makes sense.
Hi, thank you! I see that it's indeed not possible to do the check I intended to. I've been pre-filtering ASV tables with different rarity thresholds (e.g. abundance < 0.01% vs < 0.1% etc). With low thresholds I would expect to have an excess of p values around 1 (in a regular pairwise matrix of co-occurrences), as a way to control for false positive discovery rate (https://en.wikipedia.org/wiki/Q-value_(statistics)) and make an informed decision on ASV table filtering. I see that it's not possible and it doesn't much make sense to check this for associations build with flashweave. Is there any way to test something similar or equivalent in the context of flashweave?
Depending on the mode, FlashWeave will use approximative rules to decide whether a test is reliable or should be skipped, as an happen for instance with rare organisms. Unfortunately no easy rules are available for correlation tests (sensivite=true, the mode you use judging from the other issue?). In addition, heterogeneous=true also automatically removes tests with too little co-occurrence information, but I think that also doesn't apply to your case.
To get an initial idea, you could perhaps check the number of reported edges per OTU and plot that against rarity: if all OTUs below a certain threshold (say 0.01%) have no edges, that may indicate insufficient information for these OTUs. However, this is rather indirect and there could be other factors at play.