StringCompare icon indicating copy to clipboard operation
StringCompare copied to clipboard

Efficient String Comparison Functions and Fuzzy String Matching

Results 13 StringCompare issues
Sort by recently updated
recently updated
newest added

## Check if release PR fulfills these requirements - [ ] Changelog has been updated and breaking changes have been marked (if any). - [ ] Tests have been added...

Currently, every list passed to a StringCompare function is copied before operations are applied. This is due to the use of stl containers as input types and implicit conversions done...

enhancement

The check for null case should be done at the token bag level rather than the string level: https://github.com/OlivierBinette/StringCompare/blob/be58f4c1c9c24bc2cef5d9bb81053fa7ea003792/stringcompare/distance/jaccard.py#L17 I would recommend refactoring jaccard.py as follows: 1. Have the `jacard()`...

https://en.wikipedia.org/wiki/Levenshtein_automaton

Create a user example (see #2) which shows how **StringCompare** can be used to match business names. That is, suppose we have a long list L of business names. Given...

good first issue

Implement [min-hashing](https://en.wikipedia.org/wiki/MinHash) as an approximation to the Jaccard similarity.

Create an **examples** folder structured as follows: ``` examples/ ├─ 1-getting-started.ipynb ├─ 2-interesting-use-case.ipynb └─ ... ``` Each python notebook should be numbered and contain a working example of **StringCompare**'s features...

good first issue

Make installation process robust to C++ symbol not found errors and to the error where the stringcompare.distance._distance module is not found.

enhancement

Implement the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) function which, given two strings `s` and `t`, and given a Tokenizer instance `tokens`, returns the jaccard similarity between `tokens(s)` and `tokens(t)`. Here a tokenizer is...

good first issue