k-NN icon indicating copy to clipboard operation
k-NN copied to clipboard

[BUG] When submitting a KNN search with a filter that refers to a non existing field, a non-descriptive `failed to create query: Rewrite first` error is returned

Open de-code opened this issue 2 years ago • 6 comments

Describe the bug

When submitting a KNN search with a filter that refers to a non existing field, then I am getting a non-descriptive failed to create query: Rewrite first

To Reproduce Steps to reproduce the behavior:

  1. Create Index with KNN
PUT /test-index-knn1
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "vector": {
        "type": "knn_vector",
        "dimension": 3,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      },
      "field1": {
        "type": "integer"
      }
    }
  }
}
  1. Query with non-existing field:
POST test-index-knn1/_search
{
  "query": {
    "knn": {
      "vector": {
        "vector": [0, 0, 0],
        "k": 10,
        "filter": {
          "bool": {
            "must": [
              {"range": {"non-existing-field1": {"gte": 1}}}
            ]
          }
        }
      }
    }
  }
}
  1. This results in failed to create query: Rewrite first error
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: Rewrite first",
        "index": "test-index-knn1",
        "index_uuid": "E5XiPqWQTuiQBsKTz2bEqw"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test-index-knn1",
        "node": "UOMb6bKjR1qZRdNKuM2ueQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: Rewrite first",
          "index": "test-index-knn1",
          "index_uuid": "E5XiPqWQTuiQBsKTz2bEqw",
          "caused_by": {
            "type": "illegal_state_exception",
            "reason": "Rewrite first"
          }
        }
      }
    ]
  },
  "status": 400
}

Expected behavior

Correct behaviour could be one of:

  • The error message should point out what field name doesn't exist . Similar to when specifying a non existing field as the vector.
  • No error message, similar to a non-KNN query

e.g. the following query:

POST test-index-knn1/_search
{
  "query": {
    "knn": {
      "non-existing-vector": {
        "vector": [0, 0, 0],
        "k": 10,
        "filter": {
          "bool": {
            "must": [
              {"range": {"field1": {"gte": 1}}}
            ]
          }
        }
      }
    }
  }
}

Returns:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: Field 'non-existing-vector' is not knn_vector type.",
        "index": "test-index-knn1",
        "index_uuid": "E5XiPqWQTuiQBsKTz2bEqw"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test-index-knn1",
        "node": "UOMb6bKjR1qZRdNKuM2ueQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: Field 'non-existing-vector' is not knn_vector type.",
          "index": "test-index-knn1",
          "index_uuid": "E5XiPqWQTuiQBsKTz2bEqw",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Field 'non-existing-vector' is not knn_vector type."
          }
        }
      }
    ]
  },
  "status": 400
}

For non-KNN query with non existing fields, there doesn't seem to be an error message. All of the following queries succeed:

POST test-index-knn1/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {"range": {"non-existing-field1": {"gte": 1}}}
          ]
        }
      }
    }
  }
}
POST test-index-knn1/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {"non-existing-field1": {"gte": 1}}}
      ]
    }
  }
}

Plugins No additional plugins

Screenshots n/a

Host/Environment (please complete the following information):

  • OS: Ubuntu
  • Version: 22.04

Using Docker image opensearchproject/opensearch:2.9.0

Additional context

de-code avatar Oct 25 '23 11:10 de-code

Interesting, Im seeing this exception for a lot of query types: https://github.com/search?q=repo%3Aopensearch-project%2FOpenSearch%20%22Rewrite%20first%22&type=code.

Specifically, for range: https://github.com/opensearch-project/OpenSearch/blob/8673fa937db405b8d614f8d4a02c0aa52587c037/server/src/main/java/org/opensearch/index/query/RangeQueryBuilder.java#L524. It seems like this should not get sent to user. Im wondering if we didnt rewrite somewhere.

jmazanec15 avatar Nov 02 '23 20:11 jmazanec15

Currently, it doesn't look like we rewrite the "filter" querybuilder. This needs to be done before generating the query. See this example: https://github.com/opensearch-project/OpenSearch/blob/8673fa937db405b8d614f8d4a02c0aa52587c037/server/src/main/java/org/opensearch/index/query/AbstractQueryBuilder.java#L250. The result will the underlying query builders for filters not being rewritten.

Im thinking we should override the doRewrite method and rewrite the filters. @navneet1v @martin-gaievski what do you think?

cc: @vamshin

jmazanec15 avatar Nov 02 '23 21:11 jmazanec15

@jmazanec15 isn't the filter query being rewritten from this call? https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNQueryFactory.java#L159C88-L159C88

navneet1v avatar Nov 02 '23 21:11 navneet1v

@navneet1v no I dont think so:

  1. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/RangeQueryBuilder.java#L524
  2. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/AbstractQueryBuilder.java#L117

jmazanec15 avatar Nov 02 '23 22:11 jmazanec15

@navneet1v no I dont think so:

  1. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/RangeQueryBuilder.java#L524
  2. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/AbstractQueryBuilder.java#L117

Thanks for providing the info @jmazanec15. I see you provided one solution and if that is only way to fix the issue then I see no concern. But during the implementation I would recommend to see if there are other alternatives, if no then we should go ahead with the solution you provided.

navneet1v avatar Nov 03 '23 07:11 navneet1v

Not sure on better alternatives, but I see bool query builder and disk max query builder take this approach:

  1. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/BoolQueryBuilder.java#L368-L402
  2. https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/DisMaxQueryBuilder.java#L209-L227

so it should give some confidence.

jmazanec15 avatar Nov 03 '23 18:11 jmazanec15