OpenSearch Discussion on combination of result sets from different query types

Recently, from k-NN, we have received interest in combining results from text matching queries with k-NN queries (ref). So, I wanted to start a discussion about result combination and the problems doing it when scores are computed at the shard level.

From my understanding, the only way to currently combine scores from different queries is through a boolean query. In a boolean query, multiple query clauses are provided in the should or must section and the scores are combined via addition. On top of this, with function scoring, it is possible to manipulate the score of the results further before the addition of scores.

That being said, I see a few limitations with this approach.

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

Second, it is not possible to consider global hits for re-ranking. Because scores are assigned at the shard level, any rescoring has to be done at the shard level. I see a problem with this in two cases: first, for score combination, if an index has a significant number of shards, a user may want only the top results to be combined instead of combining for the results of all shards in the index and then aggregating at the coordinator; second, in the future, a user may have a model that they want to run the results through to re-rank them that is expensive and they dont want to run for each shard in the index.

That being said, my preliminary thoughts to creating a solution for this would be to create some kind of search and merge api that might look like:

GET /{index_name}/_search_and_merge
{
    "queries": [
      {
        "query": {
          "knn": {
            "my_vector2": {
              "vector": [2, 3, 5, 6],
              "k": 2
            }
          }
        }
      },
      {
        "query": {
          "match_all": {}
        }
      }
    ],
    "merge": {
      "norm": true,
      "disjunctive": false,
      "strategy": "linear_combo"
      "params": {
        "weight_1": 10.2,
        "weight_2": 0.25
      }
    },
    "size": 100
}

Here the queries would be a list of queries to be executed independently and merge would contain the logic to combine the results at the coordinator level.

All that being said, I want to see what people think of doing this? Is there a way to accomplish this with the current search interface?

Sep 19 '22 22:09 jmazanec15

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

Sep 20 '22 20:09 dblock

In general, I think it would be similar to rescore on coordinator level. The workflow I have in mind would:

Execute multiple queries that would return results to coordinator (almost like _msearch API)
Pre-process results from each query (i.e. normalization)
Expose rescore functionality based on these results (take into account score combination)
Reduce results based on score and execute fetch phase

I think extending rescore might be confusing from user perspective (i.e. when does it happen shard or coordinator).

Sep 21 '22 00:09 jmazanec15

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

Yes this can be seen as a possible solution and I think we should do some brainstorming around how we want to do this.

But as from requirements standpoint of the above ask, what Jack mentioned in the comment is the exact requirements. Do we want to call is rescore or some other name that we can debate.

Sep 27 '22 07:09 navneet1v

Take a look at this paper from the Pinecone team: https://arxiv.org/abs/2210.11934

Convex Combination shows promise as a simple/effective way to combine the two hybrid scores. (Normalize each, and regress to find a linear scaling factor between the two distributions).

Distributed normalization of course is the challenge. Shard-level might be workable.

I’d suggest starting with simply allowing the scores from each query type to be returned and combined as the developer sees fit.

For example convex combination could be implemented by two script_score queries inside a dismax tie=1 (where each script score employed a precomputed normalization function and boost for each query type).

Dec 20 '22 12:12 peterdm

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

Dec 29 '22 04:12 aishwaryabajaj-54

We have not decided yet as of now. I am working on building some proposal around the same. Will update this issue once I have some research done. But on high level I was thinking to run the queries in parallel. So that we can reduce the latency and then do the normalizations of the scores before we combine them.

But this is still very early thought. Will need to see what is best.

Jan 06 '23 20:01 navneet1v

What will be the order of execution of the BM25(may be multi-match) query and KNN query if we use a hybrid search query?

@aishwaryabajaj-54 do we see a difference if the queries are performed in different order? As per my understanding if we are doing normalization of scores for each query and then combining the scores then it doesn't matter in which order the queries are executed.

Please let us know if you think otherwise and is there can be a case where this assumption doesn't hold true.

Jan 09 '23 07:01 navneet1v

Would it make sense to extend rescoring to run on a higher level than shard (i.e. merge)? Another idea is that merge could be seen as any map/reduce, so rescore (map) and merge (reduce)? That could scale like map/reduce as well.

@dblock The idea of extending the rescoring at higher level that shard looks very promising. But on a different level I was thinking to build another phase in the whole Query and Fetch phases flow. This can be used by different plugins to do operations at Manager Level before the control goes to Fetch phase. I will try to create a proposal for this to start some discussion.

Jan 09 '23 08:01 navneet1v

Thanks @navneet1v. Since we're talking additional phases in query and fetch, cc-ing @nknize who has been thinking about refactoring aggregation, which I believe is either on the same path or at least related.

Jan 10 '23 16:01 dblock

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation https://github.com/opensearch-project/k-NN/issues/717

I realised that the query below simply retrieves based on keyword matching and then re-ranks based on cosinesimil, which leaves out many relevant candidates in vector space.

body = {
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "query_string": {
           "fields": ["category"],
           "query": "hairdresser"
       }
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "text_embeddings",
         "query_value": [0.1, 1.0, 0.3, 0.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

Jan 11 '23 17:01 rhvaz

I've tried to adapt the recommended answer on a test index from this https://stackoverflow.com/questions/67701323/full-text-and-knn-vector-hybrid-search-for-elastic and it seems to work for me. Any thoughts on this solution? Perhaps latency concerns?

body = {
  "size": 5,
  "query": {
    "function_score": {
      "query": {
        "bool": { 
          "should" : [
            {
              "multi_match" : { 
                "query": "electrician",
                "fields": ["category"]
              }
            },
            {
              "match_all": { }
            }
          ],
          "minimum_should_match" : 0
        }
      },
      "functions": [
      {
        "script_score" : {
          "script" : {
            "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.query_value, doc['my_vector']) + _score",
            "params": {
              "query_value": [1.0, 1.0, 0.0, 0.0]
            }
          }
        },
        "weight": 1
      }
      ],
      "score_mode": "sum",
      "boost_mode": "sum"
    }
  }
}

Jan 11 '23 17:01 rhvaz

I think ElasticSearch has a solution for this https://www.elastic.co/guide/en/elasticsearch/reference/master/knn-search.html#_combine_approximate_knn_with_other_features, maybe this could be replicated in OpenSearch? I've created a feature request for this too with the same recommendation opensearch-project/k-NN#717

I have responded on the feature request. Lets use that issue to talk more about the feature request.

Coming back the problem statement for this particular github issue, all the current score combinations work at Shard Level. We don't want to combine the scores at shard level, but at the Coordinator node level. Also the scores which are generated by keyword match and k-NN are not on the same scale. I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1. Quoting from description

First, it is difficult to combine result sets whose scores are produced via different scoring methods. In order to effectively combine results, the different queries' scores would need to be put on the same scale. By this, I mean that a score would need to meet 2 requirements: (1) indicates its relative relevance between it and the other documents scored in the query and (2) also be comparable with the relative relevance of results from other queries. For example, for k-NN, the score range may be 0-1 while BM25 scoring would be 0-Float.MAX_INT (I think). With this, it would be difficult to figure out an effective way to weight each result appropriately. One way to do this would be to normalize the scores before combining them. Normalization might be possible through rescoring, but this would happen at the shard level, which could cause problems when results are combined. For instance, if one shard has better results than another, normalization may skew the importance so that the top results from the latter shard are better than the former shard.

Jan 11 '23 18:01 navneet1v

I can have K-NN scores between 0 to 1 and BM-25 and floating point values > 1.

Yeah that makes sense. One way could be to use a sigmoid function on the BM-25 score to map it between 0 and 1. The parameters of the function could be hyperparameters that the user selects. Example,

1 / (1 + Math.exp( - a * (BM25_score - b)))

This a and b can be used to control the slope and point of transition of the curve.

Jan 12 '23 09:01 rhvaz

We want to use something similar to get results from vector field along with exact match. Currently the scores received from both queries are not on the same scale as mentioned in the discussion above. Following is the query we expect to work to achieve so.

body = {
 "size": 10,
 "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
              "query": "smart tv",
              "fields": ["summary"],
              "_name": "multi_match",
              "type": "phrase"
          }
        },
        {
          "knn": {
            "text_vector": {
              "vector": [2, 3, 5, 6],
              "k": 10,
                "_name": "knn"
            }
          }
        }
      ]
    }
  }
}

Will the queries above run in parallel or series? Search speed of BM25 is relatively slow.

Jan 16 '23 07:01 ankitas3

Hi @ankitas3, The queries present in the should clause doesn't run in parallel. Consider there is a shard on the data node which has lets say 3 segments. So, the sub-queries(k-nn and multi-match) will be keep on executing on the segments one by one in sequential fashion and at the Bool query level for 1 shards the scores will be combined.

The speed of BM-25 depends on many factors. I would recommend running the multi-match query separately to find is BM-25 really running slow, if yes please try to optimize the BM-25 query.

Jan 17 '23 19:01 navneet1v

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

Jan 18 '23 06:01 aishwaryabajaj-54

@navneet1v I tried running BM25 separately, which is slow. Can you share any references to improve its speed.

Jan 18 '23 06:01 ankitas3

@navneet1v, As in continuation to what @ankitas3 is doing is it possible to execute the queries in parallel, or may be specifying the execution order to reduce the latency?

@ankitas3, @aishwaryabajaj-54

Right now there is no way to execute them in parallel in a single query. You can see _msearch api but I am not sure if you are looking for that. It works something like _bulk api.

For improving the speed of the query I would recommend cutting a separate Github issue to talk, as this Github issue is not related to improving the BM-25 speed. In the new issue please make sure to provide details like OS version, Operating System, indices shards and number of nodes etc. On that GH issue we can talk around what things you can do to reduce the latency.

@ankitas3 Some places I would start looking is:

Count of segments for an index. More segments for a replica for an index, it means more latency.
See if you cluster is under-scaled. Check the CPU utilization for the data nodes.
Check the service latency, the latency coming in the api response and not the latency seen by the client. This can help pinpoint from where latency is happening.

Jan 18 '23 06:01 navneet1v

@navneet1v, I have tried _msearch and it has its own cons. Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

Jan 18 '23 06:01 aishwaryabajaj-54

Can we consider the speed and relevance benchmarking of parallel execution and normalization of scores of both queries (BM-25 and ANN) with the same GH issue?

@aishwaryabajaj-54 can you please elaborate? what is your ask here? and it is related to what?

Jan 18 '23 07:01 navneet1v

I am closing this issue, as the problem defined above is getting taken care via : https://github.com/opensearch-project/neural-search/issues/123. Please do a +1 on the issue to support the work. I will keep that other issue updated and will use that to track the progress.

Feb 24 '23 21:02 navneet1v

OpenSearch OpenSearch copied to clipboard

Discussion on combination of result sets from different query types

OpenSearch
OpenSearch copied to clipboard