python-Levenshtein icon indicating copy to clipboard operation
python-Levenshtein copied to clipboard

Q: edit distance between 2 lists

Open yasheshgaur opened this issue 6 years ago • 4 comments

Can some one please tell me the API to calculate the editdistance for 2 lists? I only see the API for 2 strings and in my use case, it is not possible to convert the list into strings.

yasheshgaur avatar Jun 15 '18 05:06 yasheshgaur

I am not sure your problem is well defined yet. Are you wanting to compare the elements of your list componentwise and then report the sum? What would an example look like?

rljacobson avatar Mar 29 '19 18:03 rljacobson

@rljacobson I believe that is what he wants. For example sometimes I need to apply Levenshtein editops on word level rather than char level sequences in which case having the words as element to a list and passing that off to Levenshtein would be helpful.

I currently have a very janky way of doing this by mapping the unique elements of my lists onto a unique char then passing that to the Levenshtein algorithm. The problem with this method is that there is a limited number of chars but there can be an unlimited number of unique elements.

def Levenshtein_editops_list(source, target):
    unique_elements = sorted(set(source + target))
    char_list = list('abcdefghijklmnopqrstuvwxyz0123456789')
    if len(unique_elements) > len(char_list):
        raise Exception("too many elements")
    else:
        unique_element_map = {ele:char_list[i]  for i, ele in enumerate(unique_elements)}
    source_str = ''.join([unique_element_map[ele] for ele in source])
    target_str = ''.join([unique_element_map[ele] for ele in target])
    transform_list = Levenshtein.editops(source_str, target_str)
    return transform_list


target = 'The Cat ate the moon'.split()
source = "The dog jumped over the moon".split()
print(Levenshtein_editop_list(source, target))

output [('delete', 1, 1), ('replace', 2, 1), ('replace', 3, 2)]

kkawabat avatar Aug 27 '19 18:08 kkawabat

@kkawabat Thanks for your function. I made a little bit change and it then works perfectly in my case.

def levenshtein_editops_list(source, target):
    unique_elements = sorted(set(source + target)) 
    char_list = [chr(i) for i in range(len(unique_elements))]
    if len(unique_elements) > len(char_list):
        raise Exception("too many elements")
    else:
        unique_element_map = {ele:char_list[i]  for i, ele in enumerate(unique_elements)}
    source_str = ''.join([unique_element_map[ele] for ele in source])
    target_str = ''.join([unique_element_map[ele] for ele in target])
    transform_list = Levenshtein.editops(source_str, target_str)
    return transform_list

MrShininnnnn avatar May 09 '20 06:05 MrShininnnnn

Given my inability to actively maintain this repo for the foreseeable future and that a solution exists, I am inclined to close this issue—unless someone else wants to work on it.

rljacobson avatar May 29 '20 18:05 rljacobson