vsearch icon indicating copy to clipboard operation
vsearch copied to clipboard

vsearch search_global time complexity

Open jianshu93 opened this issue 2 years ago • 2 comments

Dear vsearch team,

I am curious what is the theoretical time complexity for search_global algorithm related to database sequence size (considering the same length for both database sequence and query sequence). Heuristics used in the command is based on kmer filtering right, similar to that of usearch search_global? It would be interesting to know the worst case or best case or average case et.al.

Thanks,

Jianshu

jianshu93 avatar Jun 09 '22 00:06 jianshu93

USEARCH has both usearch_global (k-mer indexing + alignment) and search_global (alignment only).

VSEARCH only includes usearch_global, but you can also configure it to do the same search:

If --maxaccepts and --maxrejects are both set to 0, the complete database is searched


Heuristics used in the command is based on kmer filtering right, similar to that of usearch search_global?

Nope, no heuristics and no filtering, so the full database is searched. Search time per-query increases linearly with database size. For this comprehensive and deterministic search, speed is always the same: slow 🐌

This is why settings like --maxaccepts 1024 --maxrejects 1024 should be almost as good and a lot faster! Here, k-mer indexing time increases with database size and alignment time increases with maxaccepts for similar reads and with maxrejects for divergent reads.

colinbrislawn avatar Sep 24 '22 16:09 colinbrislawn

Thanks for the response! This is helpful!

Thanks,

Jianshu

jianshu93 avatar Sep 24 '22 16:09 jianshu93