fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

user process.extract for chinese returns wrong result

Open PalaChen opened this issue 6 years ago • 1 comments

user python2 for example

` choices = [u"星球大战",u"5月4日星球大战", u"星球大戰", u"战大球星", u"星球大战游戏下"] process.extract(u"星球大战", choices)

[(u'星球大战', 0), (u'5月4日星球大战', 0), (u'星球大戰', 0), (u'战大球星', 0), (u'星球大战游戏下', 0)] `

but

fuzz.ratio(u"星球大战", u"星球大战1") 89

PalaChen avatar Jun 11 '19 03:06 PalaChen

The default scorer that is selected by process.extract is fuzz.Wratio, which by default converts all non ascii characters to whitespaces and trims them. So in your case your comparing empty strings. So in your case use:

process.extract(u"星球大战", choices, scorer=fuzz.UWRatio)

or since you mentioned fuzz.ratio

process.extract(u"星球大战", choices, scorer=fuzz.ratio)

maxbachmann avatar Dec 17 '20 13:12 maxbachmann