api icon indicating copy to clipboard operation
api copied to clipboard

How to improve focus point queries

Open Joxit opened this issue 2 years ago • 1 comments

Use-cases

I tried to search the little city named Vars, Hautes-Alpes, France with a focus point, the request is /v1/autocomplete?lang=fr&focus.point.lat=48.03661925338169&focus.point.lon=6.580299229512&text=vars (focus point in France)

And the current result is

0) Varsovie, MZ, Pologne
1) Varsinais-Suomi, Finlande
2) Varsovie, Johns Creek, GA, USA
3) Ham-sous-Varsberg, France
4) Varsity Center, Lehigh Acres, FL, USA
5) Varsity Lakes, QLD, Australie
6) Varsovie, Throop, PA, USA
7) Varshets, MT, Bulgarie
8) Obshtina Varshets, MT, Bulgarie
9) Varsberg, France

This specific use case may be solved by #1202 but I think we can improve focus point too.

Attempted Solutions

Here is the current boosting system:
[
  {
    "function_score": {
      "query": { "match_all": {} },
      "functions": [
        {
          "weight": 15,
          "exp": {
            "center_point": {
              "origin": { "lat": 48.03661925338169, "lon": 6.580299229512 },
              "offset": "0km",
              "scale": "50km",
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "avg",
      "boost_mode": "replace"
    }
  },
  {
    "function_score": {
      "query": { "match_all": {} },
      "max_boost": 20,
      "functions": [
        {
          "field_value_factor": {
            "modifier": "log1p",
            "field": "popularity",
            "missing": 1
          },
          "weight": 1
        }
      ],
      "score_mode": "first",
      "boost_mode": "replace"
    }
  },
  {
    "function_score": {
      "query": { "match_all": {} },
      "max_boost": 20,
      "functions": [
        {
          "field_value_factor": {
            "modifier": "log1p",
            "field": "population",
            "missing": 1
          },
          "weight": 3
        }
      ],
      "score_mode": "first",
      "boost_mode": "replace"
    }
  }
]

To summary, there are 3 boosts, focus, popularity and population. The max value for focus is 16 (when then wanted point is the same as the focus point). For popularity and population, the max value is 20.

This means, even when the wanted point has is max score, it will not exceed boosts from popularity/population.

I suggest that, when focus.point is present, we may reduce the max boost of popularity and population, the new max_boost can be between 8 and 12 for example.

I will try to draft a PR next week.

Do you have any examples of a query using a focus where the popular city should still be displayed in first position?

Joxit avatar Oct 15 '21 14:10 Joxit

Yeah, this is a good analysis. The other component involved in the scoring is, of course, the text match score. This is pretty complicated and depends on the number of terms in the input query, how the parser parsed things, the number of alt-names a record has, etc.

It might be interesting to try to estimate a "maximum" score for various text lengths and, if needed, different parsing scenarios. Then we would know all the components going into the total score and could attempt to balance them more appropriately.

Funnily enough, right now we can have two problems that are almost opposites, at the same time:

  • nearby, exact text matches can score below a far away populated/popular place that is a poor text match
  • distant, exact text matches for popular places can score below a nearby place with a good text match

orangejulius avatar Oct 15 '21 15:10 orangejulius