fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

Fuzzy String Matching in Python

Results 100 fuzzywuzzy issues
Sort by recently updated
recently updated
newest added

I have Python-Levenshtein completely installed. When I try to run: py -m pip install Python-Levenshtein It says: Requirement already up-to-date: Python-Levenshtein in {location of install} So, I have Python-Levenshtein. I...

From version 0.13 onward, theres a mismatch between `process.extract(scorer=fuzz.ratio)` scores and `fuzz.ratio`. ```python #fuzzywuzzy==0.12 from fuzzywuzzy import process, fuzz process.extract('OdCeny', ['producent'], scorer=fuzz.ratio) fuzz.ratio('producent', 'OdCeny') ``` prints: ```bash [('producent', 40)] 40...

The partial_ratio calculation seems to yield incorrect results for certain combinations of strings when it uses the python-Levenshtein SequenceMatcher. This works well: ``` python > fuzz.partial_ratio('this is a test', 'is...

``` In [47]: fuzzywuzzy.fuzz.partial_ratio("red", "random") Out[47]: 33 In [48]: fuzzywuzzy.fuzz.partial_ratio("rod", "random") Out[48]: 33 In [49]: fuzzywuzzy.fuzz.partial_ratio("prod", "random") Out[49]: 25 In [50]: fuzzywuzzy.fuzz.partial_ratio("pred", "random") Out[50]: 50 ``` why "pred" is more...

Greetings fuzzers: I am looking to measure edits over large documents. I notice that Fuzzy returns only a 2 digit integer showing the ratio. Is there a historical reason for...

Update extractOne pydoc to match the return type of `extractWithoutOrder`.

This will solve issue #307, which caused inconsistent preprocessing steps. This resulted in the possibility that a self-comparison could result in another score than 100. For example, ``` list(process.extractWithoutOrder("杰弗里·S·布里特", ["杰弗里·S·布里特"],...

Hi all, I found a bug in `process.extractWithoutOrder()` which causes `process.dedupe()` to fail unexpectedly. The example: ``` process.dedupe(["BRITT JEFFREY S", "BRITT JEFFREY S.", "WIEDEMAN SCOTT", "WIEDERMANN SCOTT", "斯科特·维德曼", "杰弗里·S·布里特"]) ```...

Implemented solution to the following issue: https://github.com/seatgeek/fuzzywuzzy/issues/272 token_sim_ratio(s1, s2 ... ) robustly handles any issues associated with lexicographic sorting of tokens for the 2nd string introduced by fuzz.token_sort_ratio(s1, s2...). The...