PolyFuzz icon indicating copy to clipboard operation
PolyFuzz copied to clipboard

I don't understand what is returned

Open raffaem opened this issue 7 months ago • 5 comments

The docs provide the following example:

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF").match(from_list, to_list)

which returns

>>> model.get_matches()
         From      To    Similarity
0       apple   apple    1.000000
1      apples  apples    1.000000
2        appl   apple    0.783751
3       recal    None    0.000000
4       house   mouse    0.587927
5  similarity    None    0.000000

I don't understand what is returned. If polyfuzz is doing pairwise comparion from from list to to list, it should return len(from)*len(to) rows. So 18 results in this case.

In the example, apple->apples, apple->mouse, apples->apple, apples->mouse and so on are missing.

raffaem avatar Dec 06 '23 16:12 raffaem