abydos
abydos copied to clipboard
Fuzzy intersection using Levenshtein alignment
Add another intersection type: Fuzzy intersection based on ordered tokens, using Levenshtein alignments to parcel out the similarity weights: Given two aligned strings:
- two equal tokens mean weight for the token is divided by 2 and added to the intersection
- two unequal tokens mean weight for each token is added to its non-intersection, except in the cases of '-' tokens which don't accrue weight since they represent ins/del