fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

Strange results that depends on sort and case

Open Sovetnikov opened this issue 9 years ago • 17 comments

This is sample code:

from fuzzywuzzy import process
from fuzzywuzzy.fuzz import partial_ratio

data = ["Dom na Prishvina", "Krylatskij",
        "Prigorod.Lesnoe", "Kotel'nicheskie vysotki",
        "Novaja Presnja", "Stolichnyj", "Bukinist",
        "Voznesenskij", "Oranzhvud",
        "Akadem-Palas", "Novoe Tushino",
        "Alekseevskaja roscha", "Marshal Grad",
        "Novomolokovo", "Ljubertsy 2017",
        "Malaja Ordynka 19",
        "Rezidentsija na Vsevolozhskom",
        "Kashintsevo", "Sojuznyj",
        "Michurino-Zapad",
        "Tat'janin Park",
        "Peredelkino Blizhnee",
        "Rozhdestvenskij",
        "Vostochnoe Butovo",
        "Nemchinovka-rezidents",
        "LIFE-Mitinskaja ECOPARK",
        "Nagornaja 7", "TehnoPark",
        "Tarasovskaja, 2",
        "Dom na VDNH", "Polet",
        "VLjublino", "Letchika Babushkina 17",
        "Dacha Shatena", "Versis",
        "AFI Residence Paveletskaya", "Tarasovskaja, 25", ]

print('1 %s' % process.extractBests('AFI', data))
print('2 %s' % process.extractBests('AFI', data, scorer=partial_ratio))
print('3 %s' % process.extractBests('afi', data, scorer=partial_ratio))
print('4 %s' % process.extractBests('afi', sorted(data)))

Code output is:

1 [("Tat'janin Park", 60), ('AFI Residence Paveletskaya', 60), ('Krylatskij', 31), ('Dom na Prishvina', 30), ('Prigorod.Lesnoe', 30)]
2 [('Dom na Prishvina', 0), ('Krylatskij', 0), ('Prigorod.Lesnoe', 0), ("Kotel'nicheskie vysotki", 0), ('Novaja Presnja', 0)]
3 [('AFI Residence Paveletskaya', 100), ("Tat'janin Park", 67), ('Dom na Prishvina', 33), ('Krylatskij', 33), ('Prigorod.Lesnoe', 33)]
4 [('AFI Residence Paveletskaya', 60), ("Tat'janin Park", 60), ('Krylatskij', 31), ('Akadem-Palas', 30), ('Alekseevskaja roscha', 30)]
  1. Strange to me that process.extractBests('AFI', data) makes no difference between ("Tat'janin Park", 60), ('AFI Residence Paveletskaya', 60)
  2. Strange that process.extractBests('AFI', data, scorer=partial_ratio) does not find "AFI Residence Paveletskaya".
  3. Very strange that process.extractBests('afi', data, scorer=partial_ratio)) finds "AFI Residence Paveletskaya"!

Sovetnikov avatar Oct 14 '16 21:10 Sovetnikov