opensearch-benchmark icon indicating copy to clipboard operation
opensearch-benchmark copied to clipboard

Update Vectorsearch Core Operations: Add script score query for exact nearest neighbor search

Open jmazanec15 opened this issue 1 year ago • 2 comments

Description

For vector search, right now, we benchmark ANN methods that approximate nearest neighbor search. In addition to this, some users want to run exact k-NN search that returns the exact nearest neighbors per query. To do this, users can use the following query:

GET my-knn-index-1/_search
{
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "l2"
       }
     }
   }
 }
}

docs: https://opensearch.org/docs/latest/search-plugins/knn/knn-score-script/

I want to add this query as another option for benchmarking purposes.

jmazanec15 avatar Apr 17 '24 15:04 jmazanec15

@jmazanec15 You can use generic search operation to perform above query. However, this will not give you recall for script. Is recall required for part of your results? IIRC, search operation can support fixed query, it can't support dynamic values. @gkamat @IanHoang Can you correct me if i am wrong? Thanks

VijayanB avatar Apr 17 '24 17:04 VijayanB

@jmazanec15 @VijayanB ~~Are you requesting that this query be added to the vectorsearch workload specifically? If so, please make create an issue in the OSB workloads repository instead and feel free to link the contents there.~~ EDIT: Reopened as these would apply to already built-in OSB core operations that were added a while back.

Search operations can also support dynamic values, through the help of custom param sources. To do this, you'll need to do the following:

  1. Incorporate this query as an operation in operations/default.json. Instead of adding the body field, add a field called param-source with your custom param source name, let's say "knn-score-script-custom-params".
  2. Define a function in workloads.py and register that method as the param source for "knn-score-script-custom-params".

To see examples of these steps, see the geonames workload.

  • Step 1 example: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/5ef87710a2170ab65e5e6927be6ccc00668e9a65/geonames/operations/default.json#L253-L257
  • Step 2 example: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/geonames/workload.py#L43-L73

IanHoang avatar Apr 18 '24 18:04 IanHoang