api icon indicating copy to clipboard operation
api copied to clipboard

Possible focus point query improvements

Open orangejulius opened this issue 5 years ago • 0 comments

Summary

The way we calculate focus point bias today is by adding together scores for text match quality and distance from the focus point. Due to the way Elasticsearch calculates scores, these values are often of completely different magnitudes. This can result in one of two problems:

  • A poor text match that is slightly closer to the focus point is shown before a much better text match
  • A result that is much farther from the focus point but is a slightly better text match is shown first.

Details

A rough outline of our current autocomplete query structure when using a focus point is as follows:

{
  "must": {
    // "multiple queries of text match logic"
  },  
  "should": [{
    "function_score": {
      "query": {
        // "ONLY ONE of the potentially multiple queries for text matching"
      },  
      "functions": {
        // "`center_point` query to handle focus point"
      },  
      "boost_mode": "replace"
    }   
  }, {
    // "other should queries for text matching, popularity, etc"
  }]  
}

comments represent placeholders for complicated query logic

Elasticsearch offers guidance that scores from different queries generally cannot be compared.

While the clauses are composed into a single query, for the purposes of scoring, we can also treat the sub-clauses as their own query. I believe this leads to incorrect results.

Potential solution

Instead, I think it would be valuable (and much simpler) if the function_score query wrapped all the text matching query clauses. For example, something like this:

{
  "function_score": {
    "query": {
      "must": [{
        // "any required text matching queries"
      }], 
      "should":  [{  
        // "any optional queries such as text matching, popularity boosts, etc"
      }]  
    },  
    "functions": {
      // "`center_point` query to handle focus point"
    },  
    "boost_mode": "multiply"
  }
}

The primary change is that the focus point function score has been brought to the top level, and the boost_mode has been changed to multiply.

Examples

At the moment, I don't have any "off the shelf" examples that I believe come down entirely to this issue, but will update the issue if I find any. When I've observed this, it has been during development while tweaking other parameters, so hard to duplicate.

At the moment, I think https://github.com/pelias/pelias/issues/862 and https://github.com/pelias/pelias/issues/849 are generally masking cases where this will become an issue after solving them. We should probably look at those first.

orangejulius avatar Oct 11 '18 18:10 orangejulius