RapidFuzz icon indicating copy to clipboard operation
RapidFuzz copied to clipboard

add SIMD support for long sequences

Open maxbachmann opened this issue 2 years ago • 1 comments

for sequences with lengths over 64 characters it would still be possible to calculate the similarity for multiple sequences in parallel using simd. However for very long sequences it might be faster to compare individual sequences especially when a score_cutoff is specified

maxbachmann avatar Oct 06 '22 12:10 maxbachmann

This has a couple of problems:

  1. depending on the metric it can be hard to implement, since the algorithms behavior depends on the individual string lengths
  2. many of the algorithms use ukkonen bands to improve the runtime when the user provides a score_cutoff. Since the ukkonen bands depend on the string lengths it's not really possible to use both of them. Depending on the user provided score cutoff, this can provide a much larger speedup.

maxbachmann avatar Nov 05 '23 00:11 maxbachmann