wort icon indicating copy to clipboard operation
wort copied to clipboard

Calculating SAC on metagenome clusters

Open nmb85 opened this issue 3 years ago • 3 comments

@luizirber, one more thing for today (not intending to distract you), it would be really interesting if you could calculate the species accumulation curve (SAC) for hash sets in clusters of metagenomes in your monster wort database. For example, when looking at soil metagenomes as a cluster, you could build a matrix of hashes (such as here), calculate different orders of intersection between hash sets from the soil metagenomes, and then plot an SAC from the hashes. While this might be impossible with kmers, and species tallies are corrupted by incomplete annotation due to incomplete databases, hashes might give you a chance to get an accurate SAC based on plotting the effect of incrementally adding hash sets and seeing the change in intersection sets. See equation 3 in this paper for a definitive explanation. Then you could efficiently use all the data in the SRA and JGI dbs to estimate if the species count based on current soil metagenome is "open" (SAC fits a power law function) or "closed" (SAC fits an exponential function), that is, whether or not we've collected enough data to estimate an asymptote for the number of species (in this case using hashes as a proxy) in soil metagenomes (or some other interesting biome). Although I'm not a soil biologist, I think that's a major question in their field. Other biomes might be interesting too. Not sure if anyone has tried this with raw kmers, but it would seem too gargantuan of a task. Hashes might make this problem tractable?

nmb85 avatar Sep 04 '20 20:09 nmb85

That is a really good idea... and a monstrous matrix :rofl:

I'll work on sharing all the sigs in a couple of weeks, but it is not something I can tackle at the moment :cry:

luizirber avatar Sep 05 '20 16:09 luizirber

yes! we explored this quite a bit a while back for tara, see https://github.com/ctb/2017-sourmash-rarefy/blob/master/tara-rarefy.ipynb for an example. Haven't looked at the code in a while tho ;).

ctb avatar Sep 05 '20 16:09 ctb

Have you already seen this? https://ieeexplore.ieee.org/abstract/document/9139876

nmb85 avatar Nov 19 '20 04:11 nmb85