StringCompare
StringCompare copied to clipboard
Efficient String Comparison Functions and Fuzzy String Matching
## Check if release PR fulfills these requirements - [ ] Changelog has been updated and breaking changes have been marked (if any). - [ ] Tests have been added...
Currently, every list passed to a StringCompare function is copied before operations are applied. This is due to the use of stl containers as input types and implicit conversions done...
The check for null case should be done at the token bag level rather than the string level: https://github.com/OlivierBinette/StringCompare/blob/be58f4c1c9c24bc2cef5d9bb81053fa7ea003792/stringcompare/distance/jaccard.py#L17 I would recommend refactoring jaccard.py as follows: 1. Have the `jacard()`...
https://en.wikipedia.org/wiki/Levenshtein_automaton
Create a user example (see #2) which shows how **StringCompare** can be used to match business names. That is, suppose we have a long list L of business names. Given...
Implement [min-hashing](https://en.wikipedia.org/wiki/MinHash) as an approximation to the Jaccard similarity.
Create an **examples** folder structured as follows: ``` examples/ ├─ 1-getting-started.ipynb ├─ 2-interesting-use-case.ipynb └─ ... ``` Each python notebook should be numbered and contain a working example of **StringCompare**'s features...
Make installation process robust to C++ symbol not found errors and to the error where the stringcompare.distance._distance module is not found.
Implement the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) function which, given two strings `s` and `t`, and given a Tokenizer instance `tokens`, returns the jaccard similarity between `tokens(s)` and `tokens(t)`. Here a tokenizer is...