simstring icon indicating copy to clipboard operation
simstring copied to clipboard

Searcher with JaccardMeasure does not seem to work

Open jtourille opened this issue 5 years ago • 0 comments

Hi,

I don't really know whether it's a bug or not. When replacing the CosineMeasure by JaccardMeasure in the MWE and using 1-chargrams, I got matches with scores below the threshold.

from simstring.feature_extractor.character_ngram import CharacterNgramFeatureExtractor
from simstring.measure.jaccard import JaccardMeasure
from simstring.database.dict import DictDatabase
from simstring.searcher import Searcher

db = DictDatabase(CharacterNgramFeatureExtractor(1))
db.add('fibrates')

searcher = Searcher(db, JaccardMeasure())
results = searcher.ranked_search('abattoirs', 0.8)
print(results)

[[0.7, 'fibrates']]

jtourille avatar Aug 03 '20 20:08 jtourille