sourmash
sourmash copied to clipboard
use 'detection' terminology for fraction-of-genome-kmers-found
at STAMPS 2022, I described the fraction of genome k-mers found (p_match
in gather text output, f_match_query
in the prefetch and gather CSV output) as the genomic "extent". I learned that a different/additional term was "detection".
I found this reference which seems to use the term: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-019-0690-0
and am searching for more.
As part of documentation revamp, e.g. https://github.com/sourmash-bio/sourmash/issues/1289, could switch to using 'detection', which is growing on me as a term...
per mike lee,
I first came across it from anvio when that first came out, though it’s not described in the initial paper. The only place I think it’s documented is actually a page I put together for the anvio site 5 years ago covering a few of its terms. For detection, it’s here written in the context of contigs with an example visualization of what you already understand: https://merenlab.org/2017/05/08/anvio-views/#detection
But the generalized definition would just be something like: The proportion of a given reference sequence that is covered/matched/identified at least 1X.
Or more straightforward: The proportion of a given reference sequence that is detected.
The first time [Meren] used it for a genome-level purpose, and defined it, was in this paper: https://peerj.com/articles/4320/