java-string-similarity
java-string-similarity copied to clipboard
Jaro Winkler similarity on short strings
I am trying to use jaro wrinkler similarity to check colors strings coming from user inputted form against a palette of fixed colors.
Using jaro wrinkler similarity, I get these kind of results for very short strings:
s1 = "ed"-s2 = "red"->similarity = 0s1 = "nude"-s2 = "red"->similarity = 0.5833333134651184
Is it correct to get similarity = 0 in the first case?
The Jaro Similarity of ed and red is 0, since the number of matching characters (parameter m) is 0.
Furthermore, the length of the common prefix of s1 and s2 (parameter l) is 0.
This results in a Jaro-Winkler Similarity of 0 as
sim_jw = sim_j + l * 0.1 * (1 - sim_j) = 0 + 0 * 0.1 * 1 = 0
Jaro-Winkler gives more favorable ratings to strings that match from the beginning.
when I compare 2 strings wrt jaroWinkler "abcdefghij","aaaaaaaaa" my output comes around 0.4023.....
when I check the same on https://asecuritysite.com/forensics/simstring It gives me 0.46 Kindly help in this regard.