sdsl-lite icon indicating copy to clipboard operation
sdsl-lite copied to clipboard

Feature Request: Build FM index for a set of strings

Open lutteropp opened this issue 8 years ago • 1 comments

Hello,

I want to use the csa_wt index for counting DNA-substrings of variable size in a file containing DNA sequencing reads. The file looks like this, i.e. the reads are separated by newline characters:

ACCGTATTTAGCACTGATCGATCGATC AAGGTCGATCGATCGATCACT AAACTACGATCGATCGTACATGCA

Is there a way to tell csa_wt that suffixes spanning a newline character should be ignored in order to speed up the lookup and further reduce the size of the FM index?

lutteropp avatar Nov 04 '16 12:11 lutteropp

I also think this would be very helpful :+1:

ekg avatar Dec 08 '17 11:12 ekg