Max Bachmann

Results 300 comments of Max Bachmann
trafficstars

> But, is an equivalent call exposed in c++ ? So far this feature does not exist in either of them. However it will absolutely be implemented in C++. The...

> This is heavy in string manipulation but if you want to use one of the sort scorers, like token_set_ratio or similar, you can do it like that. This can...

> we would prefer not to add rapidfuzz as a dependency, but would instead prefer to directly include one edit distance / similarity implementation in spacy (possibly from rapidfuzz, or...

> a small, fast, MIT-licensed implementation of Levenshtein distance here https://github.com/roy-ht/editdistance Would this be acceptable to either import or copy into spaCy? What are the expected text lengths for the...

My general recommendation is: - for short sequences (< 64 chars) rapidfuzz / polyleven - mid long sequences rapidfuzz - very long strings (> 50k characters) rapidfuzz / edlib I...

This leads e.g. to the following issue: ``` Python 3.10.6 (main, Aug 2 2022, 00:00:00) [GCC 12.1.1 20220507 (Red Hat 12.1.1-1)] on linux Type "help", "copyright", "credits" or "license" for...

@da-woods apparently we do rely on the mutability, so I am unsure how we want to fix this. edit: as far as I can see this is a new test...

I have the following test in my projects: https://github.com/maxbachmann/RapidFuzz/blob/f69be93ca84e28354d16c874cb0b2d6407f238cd/.github/workflows/branchbuild.yml#L117-L120

I am not aware of anyone working on the JNI wrapper yet. I am not familiar with java myself, so this would need the help of someone familiar with Java...