fuzzywuzzy
fuzzywuzzy copied to clipboard
user process.extract for chinese returns wrong result
user python2 for example
` choices = [u"星球大战",u"5月4日星球大战", u"星球大戰", u"战大球星", u"星球大战游戏下"] process.extract(u"星球大战", choices)
[(u'星球大战', 0), (u'5月4日星球大战', 0), (u'星球大戰', 0), (u'战大球星', 0), (u'星球大战游戏下', 0)] `
but
fuzz.ratio(u"星球大战", u"星球大战1") 89
The default scorer that is selected by process.extract is fuzz.Wratio, which by default converts all non ascii characters to whitespaces and trims them. So in your case your comparing empty strings. So in your case use:
process.extract(u"星球大战", choices, scorer=fuzz.UWRatio)
or since you mentioned fuzz.ratio
process.extract(u"星球大战", choices, scorer=fuzz.ratio)