Levenshtein icon indicating copy to clipboard operation
Levenshtein copied to clipboard

Damerau–Levenshtein distance

Open vmstarchenko opened this issue 11 months ago • 1 comments

Hello.

Is it possible to calculate distance with symbol swaps using this library? https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance

If this is not possible now, will it be difficult to add such a feature?

Some implementation of this functionality:


def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in range(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in range(-1,lenstr2+1):
        d[(-1,j)] = j+1
 
    for i in range(lenstr1):
        for j in range(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i] == s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min(d[(i,j)], d[i-2,j-2] + 1) # transposition
 
    return d[lenstr1-1,lenstr2-1]

vmstarchenko avatar Mar 21 '24 00:03 vmstarchenko

You can directly use the implementation in RapidFuzz which is the basis for this library. The wikipedia describes both the damerau levenshtein distance and the optimal string alignment distance. Both of them are available:

from rapidfuzz.distance import DamerauLevenshtein, OSA

OSA.distance(s1, s2)
DamerauLevenshtein.distance(s1, s2)

maxbachmann avatar Mar 24 '24 23:03 maxbachmann