Two modes --fast (default) and --sensitive in strobealign?

Open ksahlin opened this issue 3 years ago • 0 comments

Seeding At the seeding step, the current algorithm selects, for each syncmer (start strobe), a downstream strobe from a set of downstream syncmers in a window. It is obvious that there will be some redundancies in downstream strobe sampling (see attached figure for an example).

We could instead for each syncmer start strobe, select a k-mer from the window of k-mers downstream in "nucleotide space" (i.e. not syncmer-space).

The envisioned outcome is improved randomness in seeds, hence number of matches, hence accuracy.. The tradeoff is a slowdown in the seeding step as the windows have to be larger to have the same range in nucleotide space (in expectation 5x more candidate second strobes when syncmers are sampled with a density of 1/5).

Candidate site testing

We could in sensitive mode also increase the number of candidate sites tested with -M 40 instead of default -M 20.

Current strobe creation

See some redundancies in the second strobe sampling below.

Nov 01 '22 13:11 ksahlin