Levenshtein
Levenshtein copied to clipboard
Damerau–Levenshtein distance
Hello.
Is it possible to calculate distance with symbol swaps using this library? https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
If this is not possible now, will it be difficult to add such a feature?
Some implementation of this functionality:
def damerau_levenshtein_distance(s1, s2):
d = {}
lenstr1 = len(s1)
lenstr2 = len(s2)
for i in range(-1,lenstr1+1):
d[(i,-1)] = i+1
for j in range(-1,lenstr2+1):
d[(-1,j)] = j+1
for i in range(lenstr1):
for j in range(lenstr2):
if s1[i] == s2[j]:
cost = 0
else:
cost = 1
d[(i,j)] = min(
d[(i-1,j)] + 1, # deletion
d[(i,j-1)] + 1, # insertion
d[(i-1,j-1)] + cost, # substitution
)
if i and j and s1[i] == s2[j-1] and s1[i-1] == s2[j]:
d[(i,j)] = min(d[(i,j)], d[i-2,j-2] + 1) # transposition
return d[lenstr1-1,lenstr2-1]
You can directly use the implementation in RapidFuzz
which is the basis for this library. The wikipedia describes both the damerau levenshtein distance and the optimal string alignment distance. Both of them are available:
from rapidfuzz.distance import DamerauLevenshtein, OSA
OSA.distance(s1, s2)
DamerauLevenshtein.distance(s1, s2)