python-string-similarity icon indicating copy to clipboard operation
python-string-similarity copied to clipboard

Alternate Algorithms from TextDistance

Open BradKML opened this issue 4 years ago • 2 comments

There are other algorithms that are available in TextDistance that may be considered.

Edit based

  • MLIPNS http://www.sial.iias.spb.su/files/386-386-1-PB.pdf
  • Strcmp95 http://cpansearch.perl.org/src/SCW/Text-JaroWinkler-0.1/strcmp95.c
  • Needleman-Wunsch https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
  • Gotoh http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/LOA/Lec6-Sequence-Alignment-Affine-Gaps-Gotoh1982.pdf
  • Smith-Waterman https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm

Token based

  • Tversky index https://en.wikipedia.org/wiki/Tversky_index
  • Tanimoto distance https://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_similarity_and_distance
  • Monge-Elkan https://www.academia.edu/200314/Generalized_Monge-Elkan_Method_for_Approximate_Text_String_Comparison
  • Bag distance https://github.com/Yomguithereal/talisman/blob/master/src/metrics/bag.js

Alternate Method

  • Ratcliff-Obershelp similarity https://en.wikipedia.org/wiki/Gestalt_Pattern_Matching
  • Normalized compression distance (requires compression) https://en.wikipedia.org/wiki/Normalized_compression_distance#Normalized_compression_distance

Major Source of Info: https://github.com/life4/textdistance

BradKML avatar Nov 03 '21 13:11 BradKML