RapidFuzz
RapidFuzz copied to clipboard
Rapid fuzzy string matching in Python using various string metrics
It would make sense to add a BK Tree implementation for `scorers` which full fill the triangle inequality. This would provide massive performance improvements for things like searches. https://dl.acm.org/doi/10.1145/362003.362025
SIMD support is still missing for: - [ ] process.extractOne - [ ] process.extract - [ ] process.cdist when both sequences are similar. - [ ] process.extract_iter
I run same code on pycharm and Linux, but I get different results, python: from rapidfuzz import fuzz score= fuzz.token_set_ratio("It is an apple", "It is an apple juice") print(score) In...
I've added two examples to the `fuzz.token_set_ratio` docs: One showing if one string is a subset of the other, it will return 100.0 and the other showing a divergence from...
I got the latest version of the library, but I get the following error: AttributeError Traceback (most recent call last) Cell In [55], line 3 1 from rapidfuzz import process...
``` ========================================================================================= FAILURES ========================================================================================== _________________________________________________________________________________ test_large_prefix_weight __________________________________________________________________________________ def test_large_prefix_weight(): > assert pytest.approx(JaroWinkler.similarity('milyarder', 'milyarderlik',prefix_weight=0.5)) == 1.0 tests/distance/test_JaroWinkler.py:13: _ _ _ _ _ _ _ _ _ _ _ _ _ _...
Where can I see the implementation of .partial_ratio() ? Can you let me know the logic which is utilized for this method. Thanks in advance!
While I had used rapidfuzz previously without issues in my Python scripts, now every script that uses it crashes with an error such as `illegal hardware instruction` (Console) or `Terminated...
Trying macos-12.
rapidfuzz 3.9.6 and python 3.10. I have carefully read the rapidfuzz docs and made tests. I find: partial_ratio use Indel.normalized_similarity. Could partial_ratio also support Levenshtein.normalized_similarity? Maybe default to use Indel.normalized_similarity,...