ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[FEATURE] Enhance ML Inference Search Request Processor to carry over the query metadata fields.

Open mingshl opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Currently, when rewriting query type in ML Inference Search Request Processor, users can build up a new query using query_template parameters when configuring the processor. For example, rewriting neural search query into knn query.

However, the query meta fields in search request request body will be ignored.

For example, configuring a ML Inference Search Request Processor with a cohere embedding model, cohere.ai/v1/embed, as below:

PUT /_search/pipeline/my_pipeline_neural_search
{
  "request_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run ml inference during search request",
        "model_id": "K7WVcZEBXV92Z6odCZGJ",
        "query_template": """{
                              "query": {
                                "knn": {
                                  "review_embedding": {
                                    "vector": ${modelPredictionOutcome},
                                    "k": 5
                                  }
                                }
                              }
                            }""",
        "function_name": "REMOTE",
        "input_map": [
          {
            "texts": "query.neural.review_embedding.query_text"
          }
        ],
        "output_map": [
          {
            "modelPredictionOutcome": "embeddings[0]"
          }
        ],
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

common use case to call query with query string:

GET /review_string_index/_search?search_pipeline=my_pipeline_neural_search
{
  "query": {
    "neural": {
      "review_embedding": {
        "query_text": "good review",
        "k": 5
      }
    }
  }
}

and it will rewrite to

GET /review_string_index/_search 
{
  "query": {
    "knn": {
      "review_embedding": {
        "vector": "<model inference vector>",
        "k": 5
      }
    }
  }
}

However, if I add the meta datafield _source in search request body, for example,

GET /review_string_index/_search?search_pipeline=my_pipeline_neural_search
{
  "_source": {
    "excludes": [
      "review_embedding"
    ]
  },
  "query": {
    "neural": {
      "review_embedding": {
        "query_text": "good review",
        "model_id": "K7WVcZEBXV92Z6odCZGJ",
        "k": 5
      }
    }
  }
}

It will still rewrite the same knn query and the meta datafield _source will be ignored

GET /review_string_index/_search 
{
  "query": {
    "knn": {
      "review_embedding": {
        "vector": "<model inference vector>",
        "k": 5
      }
    }
  }
}

What solution would you like? Maybe try to add a parameter to opt in carry over query field other than query string, including _source, sort, search_after, etc.

What alternatives have you considered? A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context? Add any other context or screenshots about the feature request here.

mingshl avatar Aug 20 '24 21:08 mingshl