RapidFuzz
RapidFuzz copied to clipboard
Rapid fuzzy string matching in Python using various string metrics
The [Smith Waterman algorithm](https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm) is a commonly used metric to compare strings. It would be useful to add it to RapidFuzz.
I've been trying to package RapidFuzz for Gentoo. Unfortunately, we're nowhere near close to being ready to switch to Cython 3.x, so the requirement on alpha version of Cython makes...
Hi, Thanks for writing rapidfuzz - it's been really helpful for me. I noticed some unexpected behaviour in the partial_ratio function. In my case, when the length of the shorter...
I’d love to see ratio/distance functions for Levenshtein matching bound to either the beginning or the end of the longer string. (In the sense that partial_ratio is unbound in both...
Dear Max Bachmann, today, I nice to find your package, yes better than fuzzywuzzy. thanks! I find a possible bug on fuzz.token_set_ratio which claims "Compares the words in the strings...
Currently there is only a functions editops/opcodes, which returns one possible optimal alignment. However there can be more than one optimal alignment. It would make sense to add the possibility...
The Damerau Levenshtein distance is a a commonly used metric to compare strings. Support for this could be added based on https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.1245&rep=rep1&type=pdf.
A banded version of the Levenshtein distance algorithm should be implemented as described in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.6975&rep=rep1&type=pdf. This would reduce the runtime of scorers like `fuzz.ratio` from `O(N/64 * M)` to `O(score_cutoff/64...
It would be helpful to add RapidFuzz as a Cygwin package, so no compilation is required for Cygwin users: https://github.com/spotDL/spotify-downloader/issues/1306
The current readme does not reflect many of the most recent changes. It would make sense to update it accordingly: - [ ] From the current readme it appears like...