sdsl-lite
sdsl-lite copied to clipboard
Feature Request: Build FM index for a set of strings
Hello,
I want to use the csa_wt index for counting DNA-substrings of variable size in a file containing DNA sequencing reads. The file looks like this, i.e. the reads are separated by newline characters:
ACCGTATTTAGCACTGATCGATCGATC AAGGTCGATCGATCGATCACT AAACTACGATCGATCGTACATGCA
Is there a way to tell csa_wt that suffixes spanning a newline character should be ignored in order to speed up the lookup and further reduce the size of the FM index?
I also think this would be very helpful :+1: