OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] default_search analyzer in index settings overrides the analyzer defined in mapping

Open gaobinlong opened this issue 1 year ago • 5 comments

Describe the bug When there's a default_search analyzer defined in index settings and an analyzer defined in the mapping of a field, when indexing, the analyzer in mapping is used, but when searching, the default_search analyzer will be used, so the search results are not as expected.

To Reproduce

  1. Create a index
PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "default_search": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}
  1. Index a doc
POST test/_doc/1?refresh
{
  "text": "a-11"
}
  1. Search the index
POST test/_search
{
  "query": {
    "match": {
      "text": "a-11"
    }
  }
}

, nothing return.

Expected behavior The analyzer defined in mapping takes precedence over the default_search analyzer in settings.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [2.9]

gaobinlong avatar Nov 06 '23 11:11 gaobinlong

@gaobinlong this is expected behaviour, the analyzer is indexing analyzer, the search_analyzer should be used instead in the mappings:

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace",
        "search_analyzer": "whitespace"
      }
    }
  }

reta avatar Nov 06 '23 14:11 reta

@gaobinlong this is expected behaviour, the analyzer is indexing analyzer, the search_analyzer should be used instead in the mappings:

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace",
        "search_analyzer": "whitespace"
      }
    }
  }

I think if no search_analyzer specified, analyzer will be used at both indexing time and search time, in the above case, if no default_search defined in settings, it works well:

PUT test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

POST test/_doc/1
{
  "text": "a-11"
}

POST test/_search
{
  "query": {
    "match": {
      "text": "a-11"
    }
  }
}


POST _analyze
{
  "text":"a-11",
  "analyzer": "whitespace"
}

, and if we change default_search to default in settings, it also works well, only whitespace analyzer is used at indexing time and search time:

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "default": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

gaobinlong avatar Nov 08 '23 02:11 gaobinlong

@gaobinlong shamelessly quoiting Elasticsearch docs [1] (that we have inherited). At search time, Elasticsearch determines which analyzer to use by checking the following parameters in order:

  1. The analyzer parameter in the search query. See Specify the search analyzer for a query.

  2. The search_analyzer mapping parameter for the field. See Specify the search analyzer for a field.

  3. The analysis.analyzer.default_search index setting. See Specify the default search analyzer for an index.

  4. The analyzer mapping parameter for the field. See Specify the analyzer for a field.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html

reta avatar Nov 08 '23 14:11 reta

Yeah -- while (perhaps) confusing, it's longstanding behavior that search analyzers will take precedence over the index-time analyzers.

Maybe we could define a new field-level analyzer parameter (field_analyzer?) that implicitly sets both the index and search analyzer for the field, such that it would override the index-wide default search analyzer.

msfroh avatar Nov 08 '23 18:11 msfroh

In my understanding, analyzer defined in the mapping of the field is already field-level, I don't know why default_search analyzer will override the implicit search analyzer defined in mapping, and I see this in the document of ES: Unless overridden with the search_analyzer mapping parameter, this analyzer is used for both index and search analysis.

gaobinlong avatar Dec 01 '23 09:12 gaobinlong

Close this issue as we didn't reach consensus, will open a new one if users complain about it.

gaobinlong avatar Jul 24 '24 05:07 gaobinlong