opensearch-benchmark
opensearch-benchmark copied to clipboard
Update Vectorsearch Core Operations: Add script score query for exact nearest neighbor search
Description
For vector search, right now, we benchmark ANN methods that approximate nearest neighbor search. In addition to this, some users want to run exact k-NN search that returns the exact nearest neighbors per query. To do this, users can use the following query:
GET my-knn-index-1/_search
{
"size": 4,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "knn_score",
"lang": "knn",
"params": {
"field": "my_vector2",
"query_value": [2.0, 3.0, 5.0, 6.0],
"space_type": "l2"
}
}
}
}
}
docs: https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/
I want to add this query as another option for benchmarking purposes.
@jmazanec15 You can use generic search operation to perform above query. However, this will not give you recall for script. Is recall required for part of your results? IIRC, search operation can support fixed query, it can't support dynamic values. @gkamat @IanHoang Can you correct me if i am wrong? Thanks
@jmazanec15 @VijayanB ~~Are you requesting that this query be added to the vectorsearch workload specifically? If so, please make create an issue in the OSB workloads repository instead and feel free to link the contents there.~~
EDIT: Reopened as these would apply to already built-in OSB core operations that were added a while back.
Search operations can also support dynamic values, through the help of custom param sources. To do this, you'll need to do the following:
- Incorporate this query as an operation in
operations/default.json. Instead of adding thebodyfield, add a field calledparam-sourcewith your custom param source name, let's say "knn-score-script-custom-params". - Define a function in
workloads.pyand register that method as the param source for "knn-score-script-custom-params".
To see examples of these steps, see the geonames workload.
- Step 1 example: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/5ef87710a2170ab65e5e6927be6ccc00668e9a65/geonames/operations/default.json#L253-L257
- Step 2 example: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/geonames/workload.py#L43-L73