PolyFuzz icon indicating copy to clipboard operation
PolyFuzz copied to clipboard

Fuzzy string matching, grouping, and evaluation.

Results 32 PolyFuzz issues
Sort by recently updated
recently updated
newest added

torch 1.7.0 allows only python up to 3.8 Could we remove upper boundary at all?

Hi, When trying to `pip install polyfuzz[flair]`, pip is not able to resolve the pytorch dependency: ``` INFO: pip is looking at multiple versions of polyfuzz[flair] to determine which version...

I have two questions: 1. The precision-recall curve is a trade off between the min similarity and the percentage matched. So in the ideal case you want both the precision...

I am learning the use case of polyfuzz with T5 embedding. I am getting error when using following code: ``` polyfuzz: 0.4.0 transformers: 4.26.1 torch: 1.13.1+cu117 tensorflow: 2.11.0 tensorflow_hub: 0.12.0...

I am on a mission to collect **real-world use cases** of BERTopic, KeyBERT, and PolyFuzz. For that, I can use your help! Sharing your use case will drive development and...

Are there any plans to support clustering of words based on their similarity similar to the solution described here: https://stats.stackexchange.com/questions/123060/clustering-a-long-list-of-strings-words-into-similarity-groups

Hi, Would like to understand which matching algo/model is agnostic to word orders? I realised for instance Levenshtein Distance might be affected by word orders. Thanks

Hi, I was running polyfuzz tfidf model to get the matches but few rows of the result was not sorted as per the top_n similarity score. ```python tfidf_model = PolyFuzz(tfidf_matcher)...

When using the `TFIDF` model the `min_similiary` parameter seems not to be applied to the results. Minimal Example that reproduces the problem (polyfuzz 0.4.0): ```python from polyfuzz import PolyFuzz from...

Hi @MaartenGr, We recently refactored [sparse-dot-topn](https://github.com/ing-bank/sparse_dot_topn) significantly and [released v1](https://github.com/ing-bank/sparse_dot_topn/releases/tag/v1.0.0). The most significant improvements are: - Faster implementation with lower memory overhead - new bindings using Nanobind which avoids the...