fuzzywuzzy
fuzzywuzzy copied to clipboard
how ratio in fuzzy-wuzzy calculated?
I am trying to understand the score in fuzzy-wuzzy calculated. so for now I know it depends on SequenceMatcher from difflib package. and as shown in difflib document the score calculated as this link:
Return a measure of the sequences’ similarity as a float in the range [0, 1].
Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.
Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.
but my first question what 2.0 referred to?
also, in get_opcodes, there is equal, replace and delete.
s = SequenceMatcher("private","privateT")
for opcode in s.get_opcodes():
print "%6s a[%d:%d] b[%d:%d]" % opcode
my second question does any of them affect the ratio score?
I had read some posts as here taking about the cost in edit distance, is that consider in fuzzy-wuzzy or difflib score?
thank you
As far as I know that FW uses the Levenshtein similarity ratio. You can find more explanation about its logic in this amazing article.