dicey icon indicating copy to clipboard operation
dicey copied to clipboard

Understanding the -k parameter

Open antunderwood opened this issue 9 months ago • 1 comments

I am trying to understand how to adjust -k since it seems to impact the results dramatically. For example with primers of length 25 and a -k of 15 I get no amplicons. I think this is the same as changing the Number of bp from the 3' End of the Primer Used to Seach for Matches (>14) param on the Silica website. https://www.gear-genomics.com/silica/index.html?UUID=0101f455-249c-4e7f-beb5-63f2f50a7501 image

If I change this to -k 20 or the same param on the website I get an amplicon https://www.gear-genomics.com/silica/index.html?UUID=c40f28ba-5abd-420a-8d45-6667d72de0fe image

antunderwood avatar Sep 20 '23 08:09 antunderwood

-k sets the initial k-mer length. With default options, dicey enumerates the neighborhood of this k-mer at edit distance 1 and then searches these k-mers in the genome for all exact matches. For non-unique 15-mers that occur abundantly in the genome this search may return >>10,000 matches that dicey then evaluates as possible binding sites. To speed up this search, the online version only evaluates at most 10,000 matches. On the command-line, you can force dicey to evaluate all matches using -m 10000000 but this may cause a long runtime of the program. A 20-mer is more unique in the genome and thus, less candidate hits need to be evaluated.

tobiasrausch avatar Sep 20 '23 08:09 tobiasrausch