Chris Little

Results 32 comments of Chris Little

Since the addition of PAM & BLOSUM matrices isn't directly applicable to linguistic string distances (they're used for bioinformatics), this can wait until after 1.0.

matrices to do: PAM120, PAM160, PAM250 BLOSUM45, BLOSUM62, BLOSUM80 ftp://ftp.ncbi.nih.gov/blast/matrices/

Damerau-Levenshtein should be high priority. Without it, Soft Jaccard can't be implemented with its default parameters, which include using DL as the alignment algorithm/distance measure

Alignments really need to be done by creating a secondary matrix to track the tracebacks, as in PhoneticEditDistance & DiscountedLevenshtein. The greedy method won't always get the correct alignment.

The English-language description predates and is different from the French-language specification.

Pour des raisons techniques le couplage des données relatives à une même personne, un même couple, un même ménage est devenu une nécessité en démographie, tant historique que contemporaine. En...

def henry(word): """Calculate the Henry code for a word. Henry Code is defined in: Henry, Louis. 1976. "Projet de transcription phonétique des noms de famille." Annales de Démographie Historique, 1976....

I'm not sure that Henry code can be implemented on the basis of https://www.persee.fr/doc/adh_0066-2062_1976_num_1976_1_1313. There's too much underspecification, ambiguity, & subjectivity.

Alternative description?: https://www150.statcan.gc.ca/n1/pub/85-602-x/4193729-eng.pdf

Bumping to 0.4 since this looks a bit involved.