matchr icon indicating copy to clipboard operation
matchr copied to clipboard

Difference between libraries

Open kjellkvinge opened this issue 2 years ago • 1 comments

Hi.

I have tried this library and compared it with https://github.com/adrg

In some cases we experience differences in the results:

package main

import (
	"fmt"

	"github.com/adrg/strutil/metrics"
	"github.com/antzucaro/matchr"
)

func main() {
	r2 := "wilson kjell"
	r1 := "wilson mathias"
	fmt.Printf("matchr long distance:%f\n", matchr.JaroWinkler(r1, r2, true))
	fmt.Printf("matchr short distance:%f\n", matchr.JaroWinkler(r1, r2, false))

	m := metrics.NewJaroWinkler()
	fmt.Printf("adrg:%f\n", m.Compare(r2, r1))
}
// matchr long distance:0.694444
// matchr short distance:0.694444
// adrg:0.816667

https://go.dev/play/p/z2IQsqYjIDQ

What is correct distance between these strings?

The origninal implementation (strcmp95) called from perl gives us 0.83523

Thank you.

kjellkvinge avatar Jun 13 '22 08:06 kjellkvinge

Hi.

It looks like the difference between adrg/strutil and matchr is the 0.7 limit which is not implemented in strutil.

The difference between matchr and strcmp95/perl is probably because the phonetic part is not implemented in matchr.

kjellkvinge avatar Jun 13 '22 09:06 kjellkvinge