sdsl-lite icon indicating copy to clipboard operation
sdsl-lite copied to clipboard

Maximum limit on reference and query string length

Open cjain7 opened this issue 3 years ago • 0 comments

Many thanks for providing this library!

I'm using the following code snippet as part of my application

csa_wt<> fm_index;
construct_im(fm_index, "mississippi!", 1);
std::cout << "'si' occurs " << count(fm_index,"si") << " times.\n";

But instead of "mississippi", I have a string of about 60 billion characters. This string is constructed by concatenating long reads (20x coverage of human genome), while using "$" symbol as separator. My query sequences are also long reads whose length can exceed 1M nucleotide characters. My overall codebase is mis-behaving (it finished, but produced incorrect results). While I'm starting to debug this now, I'm wondering if I am exceeding the string length limits of SDSL-LITE? This codebase is working fine for smaller datasets derived from bacterial genomes.

I'm using the latest code from master branch (commit: c32874c).

Thanks!

cjain7 avatar Sep 10 '21 04:09 cjain7