Chris Little
Chris Little
Since the addition of PAM & BLOSUM matrices isn't directly applicable to linguistic string distances (they're used for bioinformatics), this can wait until after 1.0.
matrices to do: PAM120, PAM160, PAM250 BLOSUM45, BLOSUM62, BLOSUM80 ftp://ftp.ncbi.nih.gov/blast/matrices/
Damerau-Levenshtein should be high priority. Without it, Soft Jaccard can't be implemented with its default parameters, which include using DL as the alignment algorithm/distance measure
Alignments really need to be done by creating a secondary matrix to track the tracebacks, as in PhoneticEditDistance & DiscountedLevenshtein. The greedy method won't always get the correct alignment.
The English-language description predates and is different from the French-language specification.
Pour des raisons techniques le couplage des données relatives à une même personne, un même couple, un même ménage est devenu une nécessité en démographie, tant historique que contemporaine. En...
def henry(word): """Calculate the Henry code for a word. Henry Code is defined in: Henry, Louis. 1976. "Projet de transcription phonétique des noms de famille." Annales de Démographie Historique, 1976....
I'm not sure that Henry code can be implemented on the basis of https://www.persee.fr/doc/adh_0066-2062_1976_num_1976_1_1313. There's too much underspecification, ambiguity, & subjectivity.
Alternative description?: https://www150.statcan.gc.ca/n1/pub/85-602-x/4193729-eng.pdf
Bumping to 0.4 since this looks a bit involved.