algovision
algovision copied to clipboard
Levenshtein Distance implemention
Hi, amazing paper and thanks for the work you have done.
I am trying to implement Levenshtein Distance, but been totally unsuccessful. I cannot find IndexAssign2D
mentioned in the paper. I tried to follow the examples and tried to recreate, but was unsuccessful. I will put down the code at the end. If you have working code already somewhere, i would love to see it :)
levenshtein_distance = Algorithm(
Input('inp'),
Input('out'),
#get length of first string
VarInt('s', lambda inp: inp.shape[-1]),
#get length of second string
VarInt('t', lambda out: out.shape[-1]),
# Not sure how to assign max length to 'n'
VarInt('n', lambda t: t),
# can't use 'n' here to instantiate dynamic tensor
# Var('d', torch.zeros((n, n))),
Var('d', torch.zeros((5, 5))),
# ====== unable to move ahead from here ======
# For('i', 'n',
# Set the i+1 th element of array to a
# Let('d', lambda i: [i + 1, i*0], lambda i: i + 1)),
# Print(lambda d: d),
Output('d'),
beta=1.25,
)
# int representation of the input strings
i, o = torch.tensor([[3,4,1]]), torch.tensor([[8,3,4,1]])
levenshtein_distance(i, o)
Hi, sorry for the late response, here is an implementation of it. Regarding the SM of the paper, after publishing the paper and for the publication of the code we had made some changes to make the library more user friendly, which didn't make it into the paper.
You can find an implementation of Levenshtein below:
from algovision import (
Algorithm, Input, Output, Var, # core
CosineSimilarity, # conditions
If, For, # control_structures
Let, Min, # functions
)
import torch
levenshtein = Algorithm(
# Define the variables the input corresponds to.
Input('array_s'),
Input('array_t'),
# Declare and initialize all differentiable variables.
Var('d', lambda array_s, array_t:
torch.zeros(array_s.shape[1] + 1, array_t.shape[1] + 1)
),
Var('subs_cost', torch.tensor(0.)),
Var('return_d', torch.tensor(0.)),
# Initialize the borders of the cost matrix.
For('i', lambda d: d.shape[1]-1,
Let('d', [lambda i: i+1, 0], lambda i: i+1)
),
For('j', lambda d: d.shape[2]-1,
Let('d', [0, lambda j: j+1], lambda j: j+1)
),
# Run the dynamic programming algorithm.
For('i', lambda d: d.shape[1]-1,
For('j', lambda d: d.shape[2]-1,
# Differentiable check for equality of two categorical embeddings.
If(CosineSimilarity(
lambda array_s, i: array_s[:, i],
lambda array_t, j: array_t[:, j]
),
if_true=Let('subs_cost', 0),
if_false=Let('subs_cost', 1),
),
# Correspond to
# `d[i+1, j+1] = min(d[i,j+1]+1, d[i+1,j]+1, d[i+1,j+1]+subs_cost`.
Let(
'd',
[lambda i: i+1, lambda j: j+1],
lambda d, i, j, subs_cost: Min(beta=5)(d[:,i,j+1]+1,
d[:,i+1,j]+1,
d[:,i,j]+subs_cost)
)
),
),
# Return the distance and the cost matrix.
Let('return_d', 'd', [lambda d: d.shape[1]-1, lambda d: d.shape[2]-1]),
Output('return_d'),
Output('d'),
beta=5,
)
if __name__ == '__main__':
import matplotlib.pyplot as plt
torch.manual_seed(0)
array_s = torch.softmax(1e10*torch.randn(1, 8, 2), dim=-1)#.requires_grad_(True)
array_t = torch.softmax(1e10*torch.randn(1, 6, 2), dim=-1)#.requires_grad_(True)
array_s[:,2:]=array_t
print(array_s)
print(array_t)
ret_d, ret_d_mat = levenshtein(array_s, array_t)
print(ret_d)
print(ret_d_mat)
plt.imshow(ret_d_mat.detach()[0].T)
plt.show()