Score gets bigger than 1, threshold not functional
Describe the bug
According to the docs (https://docs.orama.com/open-source/usage/search/introduction#what-does-the-search-method-return), score should be between 0 and 1.
Using the example data from the threshold doc (https://docs.orama.com/open-source/usage/search/threshold), scores are between 0 and 1 and filtering results based on thresholds works.
But with the data I am working with, the score get's bigger than 1, which also leads to threshold being useless.
As an example, I used the stopwords from this library to show this behavior.
To Reproduce
import { create, insertMultiple, search } from "@orama/orama"
import { stopwords } from '@orama/stopwords/english'
const db = create({
schema: {
title: 'string',
},
})
const getRandomWord = () => ' ' + stopwords[Math.floor(Math.random() * stopwords.length)];
insertMultiple(db, [
...stopwords.map(word => ({ title: word })),
...stopwords.map(word => ({ title: word + getRandomWord() }))
]);
const result = search(db, {
term: 'her',
threshold: 0,
});
console.log(result.hits.map(hit => {
return {
title: hit.document.title,
score: hit.score,
}
}));
Output:
[
{ title: 'hers', score: 6.584305791656615 },
{ title: 'herself', score: 6.584305791656615 },
{ title: "here's", score: 6.584305791656615 },
{ title: 'here', score: 6.227383875508277 },
{ title: 'her', score: 5.9423872295572 },
{ title: 'her her', score: 5.9423872295572 },
{ title: 'herself which', score: 3.70476794591908 },
{ title: "here's your", score: 3.70476794591908 },
{ title: "how's hers", score: 3.70476794591908 },
{ title: 'before herself', score: 3.70476794591908 }
]
Expected behavior
- Score should be between 0 and 1.
- Threshold should work as documented.
Environment Info
OS: MacOS 15.3.2
Node: 18.20.5
Orama: 3.1.2
Affected areas
Search
Additional context
No response
Hi @marconett, we just released Orama v3.1.6 with a fix on the threshold. Would you mind testing if your issue is solved? About scores being > 1, I should probably update the docs. We can rescale the scores to be between 0 and 1... but I'm not sure what the advantage would eventually be from a technical standpoint. Please tell me if I missed something!
FYI @micheleriva the docs still need to be updated