WFA
WFA copied to clipboard
A few questions about my application (changing attributes, ultralow endsfree, endsfree API, max_score heuristics)
Hi!
First of all, thank you a lot for providing this intriguing and well-documented piece of software.
I am working on an experimental longread aligner and currently testing WFA2 as the alignment backend. While doing so, I encountered a few issues.
-
As recommended, I want to reuse the
wavefront_aligner_t
objects. However, I need to change some of the attributes before each alignment invocation. Which, if any, of the attributes are safe to change on thewavefront_aligner_t
object (i.e. after callingwavefront_aligner_new
)? The most important one for me are the *_begin/end_free values of thealignment_span
. Next would be thealignment_scope
, but here I could also store two aligners instead of changing. Finally, I was thinking about varying thememory_mode
depending on the size of the input sequences. From looking at the code ofwavefront_aligner_new
, it seems like the *_begin/end_free might be safe to change after construction, the other mentioned attributes not. Is this correct? -
The longreads in my data sets are up to 600K bases long and I need ends free alignment. I saw in another issue that you recommended the
ultralow
memory mode for input sequences of this size. There is no urgency whatsoever, but it would be nice for me to have theultralow
mode available for ends free alignments. -
What I want to do is a glocal alignment, like the below example from your README. For my application, I need two pieces of information.
- I need the index of the text, where the actual alignment begins, without free end gaps. In the example, it would be 13. I initially expexted the value
wf_aligner->cigar->begin_offset
to be exactly this, but it is 0 in the example (end_offset
accordingly is 50). So my question is, what exactly are thebegin/end_offset
values and how do I (conveniently) get the value I want? Is there an easier option than looking at the cigar and doing the offset calculations? - I need the CIGAR, also excluding free end insertions. In the example, it would be "9=1X12=", but the API returns "13I9=1X12=15I". It is of course possible to trim the cigar by hand, but it would be nicer to have something like
cigar_sprint_SAM_CIGAR_endsfree
. I found thecigar_maxtrim_gap_linear
function, but it only trims the gaps from the end.
- I need the index of the text, where the actual alignment begins, without free end gaps. In the example, it would be 13. I initially expexted the value
PATTERN -------------AATTTAAGTCTAGGCTACTTTC---------------
||||||||| ||||||||||||
TEXT ACGACTACTACGAAATTTAAGTATAGGCTACTTTCCGTACGTACGTACGT
- I need the optimal alignment, but I have an upper bound on the number of allowed errors (=maximum score in edit distance). I did not fully investigate the heuristic options yet, but I believe a banded alignment would be fine for this case, maybe even adaptive? Again, there is no urgency whatsoever, but it would be lovely to have a convenience config attribute like
max_score
. It would automatically use all heuristics that preserve the optimal score below the configuredmax_score
. Just an idea from lazy me :D