RapidFuzz issues

optimal string alignment add banded implementation

Add banded implementation based on https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.1245&rep=rep1&type=pdf to improve the performance for long sequences.

maxbachmann

performance

Fuzzy matching a dict with key to a list of keywords

1

I am trying to do fuzzy matching on a dictionary which has a key attached to multiple keywords. 1) I tried combining the keywords into one large string and using...

ayushb3

question

Is there alias feature ?

5

Hi Is there a way to provide a alias options ? For example, "street", "st", "road" could be alias for some scenarios. How can this be done ? Thank you.

mkandulavm

Benchmark

4

This is not an issue, but to also tell about the positive things, when comparing the execution time on 10 million string pairs (2-22 character long) stored on a Pandas...

mirix

use Hirschbergs algorithm for Indel.editops / LCS.editops

Using Hirschbergs algorithm when calculating Indel.editops / LCS.editops would significantly reduce the memory usage. This is already done for Levenshtein which reduces the memory usage from O(N*M) to a maximum...

maxbachmann

performance

add SIMD support for long sequences

1

for sequences with lengths over 64 characters it would still be possible to calculate the similarity for multiple sequences in parallel using simd. However for very long sequences it might...

maxbachmann

performance

Feature Request: Computing the distance/similarity for 2 lists of same length

Would it be possible to write a function that computes the similarity between the corresponding elements of 2 lists of equal length? I am hoping for a ridiculous speed-up of...

thomasryde

add support for romanisation

4

As described in #7 metrics like the levenshtein distance only make much sense for langauges like chinese, if there is support for romanisation. @mrtolkien @lingvisa I opened this new issue...

maxbachmann

enhancement

help wanted

discussion

Feature request: Time limit on long-running functions

2

# Background I'm using `fuzz.partial_ratio_alignment()` to align each paragraph of a novel to a timestamped transcription from the novel's corresponding audiobook. This will allow me to know how far into...

shirakaba

enhancement

perform unicode normalization

The matching results could be improved by using unicode normalization on them. This should be a processor function, since users might be interested in the distance without normalization. In addition...

maxbachmann

enhancement

RapidFuzz
RapidFuzz copied to clipboard

Metadata

optimal string alignment add banded implementation

Fuzzy matching a dict with key to a list of keywords

Is there alias feature ?

Benchmark

use Hirschbergs algorithm for Indel.editops / LCS.editops

add SIMD support for long sequences

Feature Request: Computing the distance/similarity for 2 lists of same length

add support for romanisation

Feature request: Time limit on long-running functions

perform unicode normalization

← Metadata

Owner

Metadata

RapidFuzz RapidFuzz copied to clipboard

Metadata

← Metadata

Owner

Metadata

RapidFuzz
RapidFuzz copied to clipboard