fuzzywuzzy
fuzzywuzzy copied to clipboard
Strange results that depends on sort and case
This is sample code:
from fuzzywuzzy import process
from fuzzywuzzy.fuzz import partial_ratio
data = ["Dom na Prishvina", "Krylatskij",
"Prigorod.Lesnoe", "Kotel'nicheskie vysotki",
"Novaja Presnja", "Stolichnyj", "Bukinist",
"Voznesenskij", "Oranzhvud",
"Akadem-Palas", "Novoe Tushino",
"Alekseevskaja roscha", "Marshal Grad",
"Novomolokovo", "Ljubertsy 2017",
"Malaja Ordynka 19",
"Rezidentsija na Vsevolozhskom",
"Kashintsevo", "Sojuznyj",
"Michurino-Zapad",
"Tat'janin Park",
"Peredelkino Blizhnee",
"Rozhdestvenskij",
"Vostochnoe Butovo",
"Nemchinovka-rezidents",
"LIFE-Mitinskaja ECOPARK",
"Nagornaja 7", "TehnoPark",
"Tarasovskaja, 2",
"Dom na VDNH", "Polet",
"VLjublino", "Letchika Babushkina 17",
"Dacha Shatena", "Versis",
"AFI Residence Paveletskaya", "Tarasovskaja, 25", ]
print('1 %s' % process.extractBests('AFI', data))
print('2 %s' % process.extractBests('AFI', data, scorer=partial_ratio))
print('3 %s' % process.extractBests('afi', data, scorer=partial_ratio))
print('4 %s' % process.extractBests('afi', sorted(data)))
Code output is:
1 [("Tat'janin Park", 60), ('AFI Residence Paveletskaya', 60), ('Krylatskij', 31), ('Dom na Prishvina', 30), ('Prigorod.Lesnoe', 30)]
2 [('Dom na Prishvina', 0), ('Krylatskij', 0), ('Prigorod.Lesnoe', 0), ("Kotel'nicheskie vysotki", 0), ('Novaja Presnja', 0)]
3 [('AFI Residence Paveletskaya', 100), ("Tat'janin Park", 67), ('Dom na Prishvina', 33), ('Krylatskij', 33), ('Prigorod.Lesnoe', 33)]
4 [('AFI Residence Paveletskaya', 60), ("Tat'janin Park", 60), ('Krylatskij', 31), ('Akadem-Palas', 30), ('Alekseevskaja roscha', 30)]
- Strange to me that process.extractBests('AFI', data) makes no difference between ("Tat'janin Park", 60), ('AFI Residence Paveletskaya', 60)
- Strange that process.extractBests('AFI', data, scorer=partial_ratio) does not find "AFI Residence Paveletskaya".
- Very strange that process.extractBests('afi', data, scorer=partial_ratio)) finds "AFI Residence Paveletskaya"!