k-NN
k-NN copied to clipboard
[BUG] 2.19.0 - KNN cosinesimil scoring no longer adding 1
Describe the bug
The documentation describes cosinesimil space type as so:
Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can’t be below 0, the k-NN plugin adds 1 to get the final score.
In the latest version of Opensearch, this is no longer the case, as the output is scaled down to sit between 0 and 1.
Related component
Plugins
To Reproduce
Index settings:
{
"mappings": {
"properties": {
"embedding": {
"type": "knn_vector",
"dimension": 4
}
}
},
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
}
}
Document:
{
"embedding": [
1.0,
1.0,
1.0,
1.0
]
}
Query:
{
"query": {
"script_score" : {
"query": {
"bool": {
"must": []
}
},
"script": {
"source": "knn_score",
"lang": "knn",
"params": {
"field": "embedding",
"query_value": [
1.0,
1.0,
1.0,
1.0
],
"space_type": "cosinesimil"
}
}
}
}
}
Expected behavior
Results.hits.hits.0._score was 2.0 in 2.18.0, it is now 1.0. It appears to now scale between 0.0 and 1.0.
Additional Details
Plugins KNN
Host/Environment (please complete the following information):
- OS: Docker running on Orbstack on MacOS 15.3.1
- 2.19.0
@opensearch-project/admin Can this be transferred to the https://github.com/opensearch-project/k-NN repository?
Hi @tmoitie, this was added as part of https://github.com/opensearch-project/k-NN/pull/2357. Basically, we wanted to make consistent across different engines. Did this cauise a problem? Looks like we might need to update docs
Yes, it was a quite unexpected backwards compatibility break, especially given that it deviated from the documentation. We were using score thresholds to define proximity of results.
Sorry I might be missing something but it's very clear from the documentation this is a breaking change: "Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can’t be below 0, the k-NN plugin adds 1 to get the final score."
Specifically, "opensearch relevance scores can't be below 0"
We now receive this error log in production a fair amount:
script score function must not produce negative scores, but got: [-1.0]
Is there any response on this? This is a breaking change is it not?
@tmoitie In 2.19 we removed that line and updated documentation to show new score formula https://docs.opensearch.org/docs/2.19/field-types/supported-field-types/knn-spaces/ . We updated cosine formula to use one calculation ( approximate and exact ) to keep it consistent across engine and search type. This should not affect recall of your results. Do you see otherwise?
Sorry for responding late. @SamuelCox The new score from OpenSearch should be in scale of 0 to 1 instead of 1 to 2 from before. Do you see that OpenSearch returns negative score when using cosine similarity function in script score? If so, can you share one query vector and matched input vector that caused negative score? Thank you
It's a breaking change if you were setting the MinScore property on Script_Score as now scores will come back as x-1, and will therefore affect the recall of your results. Of course it will.
We were wrong about it causing a negative score (that was a red herring from a different script_score query), but this is still clearly a breaking change.
Take the example query here:
{
"script": {
"source": "knn_score",
"lang": "knn",
"params": {
"field": "searchableTextEmbedding",
"query_value": [
#embedding goes here
],
"space_type": "cosinesimil"
}
},
"min_score": 1.52
}
Prior to opensearch 2.19, all results would return between 1 and 2 and therefore there would be x amount of results that have a score > 1.52, therefore there would be results. Post opensearch 2.19, there would be results only between 0 and 1, therefore there would be 0 results. So this is a breaking change that affects recall, unless I'm missing something? Of course, we will just change the min_score to be 1 less, but this is definitely a breaking change that affects recall if you use the min_score parameter.
@SamuelCox Are you seeing this behavior in 2.19 cluster for an index that is created before 2.19 ?
It only happens for indexes created with 2.19. Indexes carried forward from 2.18 still use the old 0-2 range. It was a surprise to see the scoring range change. If it was in the release notes, my eye didn't catch it.
@adurgin Yes, we moved it to consistent scoring formula for indices that are created 2.19 or above. Old indices should not be impacted. Like you mentioned we added it to release notes too https://github.com/opensearch-project/k-NN/releases/tag/2.19.0.0
It was mentioned as an enhancement, but if it affects recall on new indices fundamentally it should have been listed as a breaking change, it's as simple as that. Teams create new indices all the time, and if you're not aware you need to make a code change after creating a new index, that's a problem.
This is a good learning thanks @SamuelCox. I think going forward we should treat scoring changes as breaking. Thanks for bringing this up.
Just adding a comment on this. Indeed it's not clear from the change log that this was modified and it also broke in prod for us. Even though it's not "breaking" since it still works, it should still be considered as a breaking change and it should be clearer in the change log.