FuzzySearch icon indicating copy to clipboard operation
FuzzySearch copied to clipboard

how to honor the matches in short entries greater than long entries?

Open halukkaramete opened this issue 3 years ago • 5 comments

Is there a built-in option so that when experimented with it, the density of the number of matching characters ( I mean the highlighted matches in red ) is honored more compared to the lengthier entries? I think that would increase the relevancy automatically.

Example screenshot:

Here, I searched for Hajj ... ( in 1.8Mb 13,000 items file ) And the most top of the line entries ( which are below ) ended up around at 100th or so in the suggested items.

  • Hajj (Pilgrimage) 247 @Sahih al-Bukhari
  • Hajj: Hajj 261 @Muwatta Malik

They are pretty short and bingo like matches yet plenty of long ones were preceding them.

How can I easily rise them to the top? Or at least near to the top?

Here are the winners...

Screen Shot 2022-04-11 at 11 45 09 AM

Here are the poor little ones being crashed by the winners:

Screen Shot 2022-04-11 at 11 45 26 AM

Clearly, the red density on those short ones are noticeably higher.

Especially this guy:

Screen Shot 2022-04-11 at 11 59 08 AM

What do you think Jean?

I have a feeling this has to do with your compare method.
The solution may be it, but if so, how do I create that comparison?

I use

bonus_match_start: 0.6, 
highlight_bridge_gap: 0 

Must see it in practice, but what I'm asking for could create a tremendous difference in quality especially when main topics and sub-topics are searched like in my case.

halukkaramete avatar Apr 11 '22 09:04 halukkaramete

you can use the sorter option and combine the length and the score https://github.com/jeancroy/FuzzySearch/blob/master/src/init.js#L52

jeancroy avatar Apr 12 '22 13:04 jeancroy

I'll be honest, it looks like you want to recommend short paragraph given a thematic. This library was more about find a needle in a haystack.

Rigth now machine learning as a service is ripe enough that it may interest you. See for example https://docs.microsoft.com/en-us/azure/cognitive-services/language-service/question-answering/overview

jeancroy avatar Apr 12 '22 13:04 jeancroy

For those who do not know how to do sorting based on size, using the "sorter" functionality...

Add this to your option when setting up your FuzzySearch obj.

sorter: myFunction,
and then provide this somewhere on your page

function myFunction(a, b) {

    // when 2 items are equal in score, the shorter ones will rise above the longer ones  
    // if you do not use this function, sorting is done by alpha ( which is the default) 

    var d = b.score - a.score;
    if (d !== 0) return d;
    // var d = a.item.length - b.item.length; 
    var ak = a.item.length, bk = b.item.length;
    return ak > bk ? 1 : ( ak < bk ? -1 : 0);

}

halukkaramete avatar Apr 13 '22 00:04 halukkaramete

I'll be honest, it looks like you want to recommend short paragraph given a thematic. This library was more about find a needle in a haystack.

That's an entirely different take. I'm ok with using your library. I will work out the json so the searches will be done in only on signal words ( excluding the English Stop words ), which are stemmed ( using Porter2 ) along with Synonyms. What I'm working on is one of a kind when it comes to this subject and I'd like to use your library. Once I launch this, it will be used by millions of people.

halukkaramete avatar Apr 13 '22 00:04 halukkaramete

if (d !== 0) return d;

I'd use something like abs(d) < 0.1 or d*d < 0.01

The thing is the score is a float, but you may find two results are similar enough to start giving importance to overall size. I have not tested 0.1 you may find something better for your taste.

Keep the good work then, I see the subject matter is important.

jeancroy avatar Apr 13 '22 02:04 jeancroy