WFA icon indicating copy to clipboard operation
WFA copied to clipboard

Feature requests: R bindings and early stopping

Open traversc opened this issue 1 year ago • 1 comments

Nice work!

Would it be possible to get R bindings for this? I put together a minimal example here: https://github.com/traversc/WavefrontAlignR (feel free to do whatever with it)

I'd also like to a request an "early stopping" feature, where if the best possible alignment distance exceeds a user defined threshold, stop alignment and return a flag value (like INT_MAX). Assuming this doesn't add too much overhead, this would be useful because I'm mostly interested in finding only highly similar sequences between two sets.

Last, I ran a quick benchmark comparing an existing R package. Is this a fair comparison? Code used to run WFA2 here: https://github.com/traversc/WavefrontAlignR/blob/main/src/WFA_bindings.cpp

# Benchmark for a 10,000 x 10,000 alignment
# "seqs" is a vector of DNA sequences on average 43 bp long
library(WavefrontAlignR)
library(stringdist)
library(tictoc)

# WFA2 levenshtein
tic()
y1 <- WavefrontAlignR::edit_dist_matrix(seqs, seqs)
toc()
# 191.452 sec elapsed, 522324 alignments / sec

# stringdist levenshtein
tic()
y2 <- stringdist::stringdistmatrix(seqs, seqs, method = "lv", nthread=1)
toc()
# 677.356 sec elapsed, 147633 alignments / sec

traversc avatar Aug 08 '23 18:08 traversc

Sorry for the late reply (I was about to send this message, and then it slipped my mind...).

(1) R bindings

Yes, sure, that would be awesome. At this moment, don't have the bandwidth to implement this feature. But is definitely something I would like to have. Thanks for the example and request.

If you feel like it, you could wrap your example under bindings/r (linked to the current version) and make a pull request. I would be very happy if you take over and take the credit for it. Only if you want to.

(2) Early stop

There is actually one here. the function wavefront_aligner_set_max_alignment_steps allows to set the maximum number of sets (i.e., max alignment score) to reach before quitting. Have a look and let me know if that is what are you looking for.

Let me know, Thanks.

(3) (NxN) benchmark

In principle, seems fair to me (edit, score only, ...).

smarco avatar Sep 25 '23 16:09 smarco