SimMetrics.Net icon indicating copy to clipboard operation
SimMetrics.Net copied to clipboard

ChapmanMeanLength returns nearly the same value

Open David-Maisonave opened this issue 5 months ago • 2 comments

ChapmanMeanLength always returns 9.xx regardless of the inputs. Here's the result when comparing "David" to the following associated strings:

David	9.223682
Dave	9.299208
Davdi	9.223682
Dadiv	9.223682
david	9.223682
Dovid	9.223682
Divad	9.223682
divaD	9.223682
Daves	9.223682
Maday	9.223682
xxxxx	9.223682
12345	9.223682
Dav id	9.148616
Da.v.id	9.07401
D-avid	9.148616

The first row should have a zero distance, but it returns a values which is the same to strings "xxxxx" and "12345". I looked at the source code, and I don't see how this class could perform string comparisons. Is this an incomplete class?

David-Maisonave avatar Aug 18 '25 22:08 David-Maisonave

When I read the "description" from ChapmanMeanLength it seems that is not implemented as we would think.

I did ask ChatGPT for an implemenation, and I'll add this as ChapmanMeanLengthTrue class.

StefH avatar Aug 19 '25 19:08 StefH

I get the sense that ChapmanMeanLength is not really a string matching algorithms. But it's true purpose is not clear to me.

FYI: I'm using slightly modified version of the SimMetrics code in my own project. SqliteFuzzyPlusExtension.

Most of the algorithms work great. ChapmanMeanLength is the only one I had issues with.

David-Maisonave avatar Aug 20 '25 20:08 David-Maisonave