abydos
abydos copied to clipboard
DiscountedLevenshtein can be less than Levenshtein....?
from abydos.distance import * lev = Levenshtein() dlev = DiscountedLevenshtein() lev.dist('cat', 'hat') < dlev.dist('cat', 'hat')
Is this correct, though?
Also, this alignment seem sub-optimal. (I think the l in Neil should be matched with an l in Niall.)
cmp.alignment('Niall', 'Neil') (2.526064024369237, 'N-iall', 'Neil--')
fixed alignment issue in b04ca90b
This is a result of the normalizing term in combination with the discounting function. It's worth re-examining this issue to determine if the supplied discounting functions are good, but it's not a bug.
Do you know of any code example of using abydos for matching two Python string lists by calculating minimal distances?
longRefList = ["Name 0001", "Name 0002", ... "Name 9999"]
mylist = ["Name 2345", "xdsdfj ABCD", "Name x23f"]
# ... whatever code to calculate,
# for each item in list 2, the distance & position of closest item in list 1
# ... to output something like this:
matchOutput = [
{"dist":0, "position":2344},
{"dist":0.999, "position": 8831},
{"dist":0.5, "position":230}
]
I am particularly interested in using ReesLevenshtein distance. But I wonder how slow could this be. Do you know if somebody has tried to use abydos for trying to merge pandas dataframes by minimal distance matching between two columns?
Thanks a lot in advance for your advice. @abubelinha