PolyFuzz icon indicating copy to clipboard operation
PolyFuzz copied to clipboard

Fuzzy string matching, grouping, and evaluation.

Results 30 PolyFuzz issues
Sort by recently updated
recently updated
newest added

I have been experimenting with PolyFuzz for a while. I have observed some weird scoring behavouir. Following is the case in which I am getting a very high score of...

In the code below (with output in attached picture) I perform a simple TFIDF matching of `["apple", "apples", "appl", "recal", "happy"]`. The initial `min_similarity` is set to 0.2. The similarity...

Although I was able to use PolyFuzz once for some of your basic example code, once I tried messing around with Embeddings or Bert, the entire package broke. It seems...

Hello, can this tool be used to calculate the semantic simliarity between two words such as "happy" and "sad"? @MaartenGr

Hi -- I'm using Python 3.10. The 'time.clock()' method in timing.py is being (has been) deprecated. This breaks polyfuzz upon importing your module. To make it work, I changed the...

Keywords "trinkwasser test", "trinkwassertest" and "analyse trinkwasser" aren't clustered at all.

Hi @MaartenGr, I have fine-tuned a fuzzy transformer for char level similarity to do fuzzy matching, you can read about how I did here: LinkedIn post explanation: https://www.linkedin.com/feed/update/urn:li:activity:6819456033992253440/ Model on...

rendering the plots in a dark themed Jupyter notebook Results in unreadable titles (Black on dark). By returning the matplotlib Figure object, the user has the option to modify the...

[Faiss](https://github.com/facebookresearch/faiss) allows you to efficiently search and cluster dense vectors. This could be beneficial when comparing the cosine similarities between vectors in the TF-IDF and Embeddings model. However, since it...

enhancement

I need a sentence-level ngram option since I'm checking on similarities between short texts. Maybe this option is useful for others!