WFA icon indicating copy to clipboard operation
WFA copied to clipboard

A few questions about my application (changing attributes, ultralow endsfree, endsfree API, max_score heuristics)

Open feldroop opened this issue 9 months ago • 3 comments

Hi!

First of all, thank you a lot for providing this intriguing and well-documented piece of software.

I am working on an experimental longread aligner and currently testing WFA2 as the alignment backend. While doing so, I encountered a few issues.

  1. As recommended, I want to reuse the wavefront_aligner_t objects. However, I need to change some of the attributes before each alignment invocation. Which, if any, of the attributes are safe to change on the wavefront_aligner_t object (i.e. after calling wavefront_aligner_new)? The most important one for me are the *_begin/end_free values of the alignment_span. Next would be the alignment_scope, but here I could also store two aligners instead of changing. Finally, I was thinking about varying the memory_mode depending on the size of the input sequences. From looking at the code of wavefront_aligner_new, it seems like the *_begin/end_free might be safe to change after construction, the other mentioned attributes not. Is this correct?

  2. The longreads in my data sets are up to 600K bases long and I need ends free alignment. I saw in another issue that you recommended the ultralow memory mode for input sequences of this size. There is no urgency whatsoever, but it would be nice for me to have the ultralow mode available for ends free alignments.

  3. What I want to do is a glocal alignment, like the below example from your README. For my application, I need two pieces of information.

    • I need the index of the text, where the actual alignment begins, without free end gaps. In the example, it would be 13. I initially expexted the value wf_aligner->cigar->begin_offset to be exactly this, but it is 0 in the example (end_offset accordingly is 50). So my question is, what exactly are the begin/end_offset values and how do I (conveniently) get the value I want? Is there an easier option than looking at the cigar and doing the offset calculations?
    • I need the CIGAR, also excluding free end insertions. In the example, it would be "9=1X12=", but the API returns "13I9=1X12=15I". It is of course possible to trim the cigar by hand, but it would be nicer to have something like cigar_sprint_SAM_CIGAR_endsfree. I found the cigar_maxtrim_gap_linear function, but it only trims the gaps from the end.
    PATTERN    -------------AATTTAAGTCTAGGCTACTTTC---------------
                            ||||||||| ||||||||||||
    TEXT       ACGACTACTACGAAATTTAAGTATAGGCTACTTTCCGTACGTACGTACGT
  1. I need the optimal alignment, but I have an upper bound on the number of allowed errors (=maximum score in edit distance). I did not fully investigate the heuristic options yet, but I believe a banded alignment would be fine for this case, maybe even adaptive? Again, there is no urgency whatsoever, but it would be lovely to have a convenience config attribute like max_score. It would automatically use all heuristics that preserve the optimal score below the configured max_score. Just an idea from lazy me :D

feldroop avatar May 15 '24 15:05 feldroop