FlashWeave.jl icon indicating copy to clipboard operation
FlashWeave.jl copied to clipboard

Question about using flashweave with sequencing data of multiple organisms

Open AnnaClo opened this issue 1 year ago • 5 comments

Hi, I need to construct a network using sequncing data for multiple organisms, e.g. bacteria, fungi, protists, etc., each obtained from the sequencing of a different amplicon. I understand that I could imput this data combined in one table to flashweave. I'm doubting if and how flashweave will handle the normalization for differing sequencing depth. Since each organism comes from the sequencing of different amplicons, it should be normalized independently from the other organisms. If I would do the normalization out of flashweave, I would do clr transformation for bacteria, fungi and protists independently. What is your advice? How will the flashweave algorithms deal with this kind of data? Is this data suitable for flashweave? Should I apply clr transformations before using flashweave?

Thank you in advance

AnnaClo avatar Nov 27 '24 22:11 AnnaClo

Hi Anna,

Yes, FlashWeave supports providing several tables to be normalized independently (inspired by exactly the use case you mentioned). It's unfortunately poorly documented, but can be used like this: learn_network([<bac_data_path>, <fungi_data_path>], meta_data_path; <kwargs...>). Please let me know if this works for you.

jtackm avatar Dec 02 '24 09:12 jtackm

Hi! I used Flashweave for multiple organisms as you indicated and it worked wonderfully. I still have one aspect that I want to ask about. I would like to include also some non-sequencing variables as nodes in the network (e.g. pH, organic matter, bacterial biomass, ..). I see that it is possible to add a 'features' table, but I don't fully understand, how would Flashweave handle those variables? Ideally, I would need to apply log (or log+1) transformation to non-sequencing variables, rather than CLR with adaptive pseudocount. Is that possible in any way in Flashweave? Alternatively, is there any function for performing the sequencing data normalization ouside learn_network? Then I would provide the tables normalized differntly to learn_network and run it with normalize=false.

AnnaClo avatar Jan 23 '25 18:01 AnnaClo

Hi Anna,

Metadata only undergoes the most basic preprocessing, separate from OTU normalization: discretization on the raw values if FlashWeave is run with sensitive=false and special treatment of metadata zeros (via pseudocounts) with heterogeneous=true. Beyond that, you should pre-normalize your metadata if you have special requirements and FlashWeave will then proceed make these values compatible with the tests being used internally. Hope that helps!

jtackm avatar Jan 31 '25 16:01 jtackm

Hi, thank you! So if I understand well, I would input the ASV tavbles as normal (raw, which will undergo CLR normalization within flashweave) and the non-sequencing variables (which I pre-normalize outside flashweave) as metadata. In that case, both ASVs and metadata variables will be considered as nodes of the network, is that correct ?

These are my network parameters: alpha = 0.01, fdr = true, sensitive = true, feed_forward = true, max_k = 2, max_tests = 10000000, conv = 0.01, make_sparse = true, n_obs_min = 2

As a side note, I have a doubt about make_sparse. When I run this network with make_sparse=true, in some cases the verbose output shows me sparse = false: Run information: sensitive - true heterogeneous - false max_k - 2 alpha - 0.01 sparse - false workers - 1 OTUs - 20392 MVs - 0

I also get this warning :
-> multiple data sets provided, using separate normalization mode ┌ Warning: Adaptive CLR is inefficient with sparse data, using dense format for normalization └ @ FlashWeave ~/.julia/packages/FlashWeave/j91Ng/src/preprocessing.jl:542

Why this?

AnnaClo avatar Feb 03 '25 11:02 AnnaClo

Yes, ASVs and meta variables will all be nodes in the network. Regarding sparse: for your combination of flags, FlashWeave uses an adaptive clr normalization scheme which replaces 0s with adaptive pseudocounts. Hence, the table is no longer dominated by 0s and sparsity is turned off for efficiency. The warning you posted tries to convey that.

jtackm avatar Feb 17 '25 09:02 jtackm