fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

How to decrease False positive matches? (process.extract / WRatio)

Open Pranav082001 opened this issue 3 years ago • 3 comments

I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

inp_name="america"

name_list=["american Futures and Options Exchange"]
        
process.extractOne(inp_name,name_list)

Output--> ('american Futures and Options Exchange', 90.0, 0)

PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!

Pranav082001 avatar Aug 02 '22 06:08 Pranav082001

Maybe write your own version of WRatio, which does not fall back to the partial version of the algorithms.

maxbachmann avatar Aug 02 '22 06:08 maxbachmann

Could you please help me. Do I need to set try_partial parameter False in def WRatio? https://github.com/seatgeek/fuzzywuzzy/blob/af443f918eebbccff840b86fa606ac150563f466/fuzzywuzzy/fuzz.py#L272

Pranav082001 avatar Aug 02 '22 07:08 Pranav082001

Yes thats what I would try

maxbachmann avatar Aug 03 '22 14:08 maxbachmann