sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

Application of `f_unique_to_query` and `threshold_bp`

Open Amanda-Biocortex opened this issue 9 months ago • 5 comments

Hi,

I am using Sourmash to profile bacteria composition and abundance in shotgun WGS stool samples, and have two questions:

Could you expand on what you mean by this statement with regards to the f_unique_to_query column?: 'This column should be used in any analysis that needs to avoid double-counting matches.' Currently, I am using all the rows in the output table, am I double counting by not 'using' f_unique_to_query?

My current parameters are k=31, s=1000, threshold_bp=2000 In your experience will this low threshold return a very high number of false positives?

Many thanks

Amanda-Biocortex avatar May 13 '24 10:05 Amanda-Biocortex