fuzzywuzzy icon indicating copy to clipboard operation
fuzzywuzzy copied to clipboard

how ratio in fuzzy-wuzzy calculated?

Open fatimamb opened this issue 4 years ago • 1 comments

I am trying to understand the score in fuzzy-wuzzy calculated. so for now I know it depends on SequenceMatcher from difflib package. and as shown in difflib document the score calculated as this link:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.
 Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.

but my first question what 2.0 referred to?

also, in get_opcodes, there is equal, replace and delete.

s = SequenceMatcher("private","privateT")
    for opcode in s.get_opcodes():
          print "%6s a[%d:%d] b[%d:%d]" % opcode

my second question does any of them affect the ratio score?

I had read some posts as here taking about the cost in edit distance, is that consider in fuzzy-wuzzy or difflib score?

thank you

fatimamb avatar Nov 09 '20 15:11 fatimamb

As far as I know that FW uses the Levenshtein similarity ratio. You can find more explanation about its logic in this amazing article.

MahmoudAliEng avatar Dec 13 '20 14:12 MahmoudAliEng