RapidFuzz icon indicating copy to clipboard operation
RapidFuzz copied to clipboard

Rapid fuzzy string matching in Python using various string metrics

Results 58 RapidFuzz issues
Sort by recently updated
recently updated
newest added

Add banded implementation based on https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.1245&rep=rep1&type=pdf to improve the performance for long sequences.

performance

I am trying to do fuzzy matching on a dictionary which has a key attached to multiple keywords. 1) I tried combining the keywords into one large string and using...

question

Hi Is there a way to provide a alias options ? For example, "street", "st", "road" could be alias for some scenarios. How can this be done ? Thank you.

This is not an issue, but to also tell about the positive things, when comparing the execution time on 10 million string pairs (2-22 character long) stored on a Pandas...

Using Hirschbergs algorithm when calculating Indel.editops / LCS.editops would significantly reduce the memory usage. This is already done for Levenshtein which reduces the memory usage from O(N*M) to a maximum...

performance

for sequences with lengths over 64 characters it would still be possible to calculate the similarity for multiple sequences in parallel using simd. However for very long sequences it might...

performance

Would it be possible to write a function that computes the similarity between the corresponding elements of 2 lists of equal length? I am hoping for a ridiculous speed-up of...

As described in #7 metrics like the levenshtein distance only make much sense for langauges like chinese, if there is support for romanisation. @mrtolkien @lingvisa I opened this new issue...

enhancement
help wanted
discussion

# Background I'm using `fuzz.partial_ratio_alignment()` to align each paragraph of a novel to a timestamped transcription from the novel's corresponding audiobook. This will allow me to know how far into...

enhancement

The matching results could be improved by using unicode normalization on them. This should be a processor function, since users might be interested in the distance without normalization. In addition...

enhancement