AncientMetagenomeDir
AncientMetagenomeDir copied to clipboard
Improve definition for 'metagenomic sample'
https://spaam-community.slack.com/archives/C0183TC8B0R/p1667987904955569
To summarise, I was unsure whether to include #1001 or not.
It is a sedaDNA sample, however the ultimate purposes/outcome of the paper was for population genetics of particular eukaryotic species, rather than metagenomic analyses. However they did upload shotgun data and they did as a screening step run Centrifuge (a metagenomic classifier) for screening.
After the discussion on slack, there was an overall consensus to indeed include it, however there was some discussion about improving the definitions of metagenomes
and how/what to include them.
The following suggestion from @pheintzman was made:
Perhaps a definition of a metagenomics sample could be something like: “An ancient sample that is either expected or has been shown to contain genomic information from multiple taxa”
Where by 'expected' would be environmental deposits (sediments, dental calculus, etc), and 'shown' would be tissue from individuals where pathogens and/or microbiomes were found. And that [this would] avoid[s] the rabbit hole of including any and all ancient samples, which may well, but not necessarily, be metagenomic.
However, myself and @aidaanva pointed out that this didn't really work with what we count in the host-associated metagenomic table as
because [in some cases] a tooth may have a very minimal multi-taxa presence still. The oral signal [can be] present (even if [highly] skewed), so we need to be a bit more precise I think
Whereby I mean being more precise in the sense that for the host-associated microbiome while teeth may hold the oral signal it is so skewed you wouldn't want to use it for 'ecological' analyses.
When questioned about technically any 'ancient' sample will be metagenomic: @alexhbnr also provided a further refinement when comparing to a bone: "So we expect a similar signal as for a sediment sample that was studied for microbes and not mammals."
Our current definition:
We define here 'metagenome' in a broad sense, primarily focusing any data where the whole DNA content is analysed and explored. Examples for this are (but not limited to) ancient microbiomes (host associated metagenome), ancient sedimentary DNA (environmental) and also samples used for ancient pathogen screening (single genomes).
I sort of like what we already have, and that technically includes Gelabert... I wonder if we could improve it slightly by saying something like;
where the whole DNA content is analysed and explored at some point during a scientific publication?
Maybe instead something like "we explicitly exclude host-derived samples for which the host DNA was the primary focus of the study"? Because I don't think that "where the whole DNA content is analysed and explored" always applies to pathogen capture data - isn't the metagenomic analysis for these is often just to look for signals of the species of interest, then focus is on the reads from that species?
Or "sole focus" rather than "primary focus"
Yes, that's correct, I like your idea of:
we explicitly exclude host-derived samples for which the host DNA was the sole focus of the study
However to clarify - this would only apply to host-associated samples right? Not sedimentary samples?
Right, not to sedimentary samples