Max Bachmann
Max Bachmann
> But, is an equivalent call exposed in c++ ? So far this feature does not exist in either of them. However it will absolutely be implemented in C++. The...
> This is heavy in string manipulation but if you want to use one of the sort scorers, like token_set_ratio or similar, you can do it like that. This can...
> we would prefer not to add rapidfuzz as a dependency, but would instead prefer to directly include one edit distance / similarity implementation in spacy (possibly from rapidfuzz, or...
> a small, fast, MIT-licensed implementation of Levenshtein distance here https://github.com/roy-ht/editdistance Would this be acceptable to either import or copy into spaCy? What are the expected text lengths for the...
My general recommendation is: - for short sequences (< 64 chars) rapidfuzz / polyleven - mid long sequences rapidfuzz - very long strings (> 50k characters) rapidfuzz / edlib I...
This leads e.g. to the following issue: ``` Python 3.10.6 (main, Aug 2 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux Type "help", "copyright", "credits" or "license" for...
@da-woods apparently we do rely on the mutability, so I am unsure how we want to fix this. edit: as far as I can see this is a new test...
I would love this feature as well
I have the following test in my projects: https://github.com/maxbachmann/RapidFuzz/blob/f69be93ca84e28354d16c874cb0b2d6407f238cd/.github/workflows/branchbuild.yml#L117-L120
I am not aware of anyone working on the JNI wrapper yet. I am not familiar with java myself, so this would need the help of someone familiar with Java...