ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[BUG] Incorrect validation logic for map type in xxxProcessor

Open zane-neo opened this issue 10 months ago • 1 comments

What is the bug? When user use map type configuration in several processors, the validation can fail since validation is been done on extra fields in that map. How can one reproduce the bug? Steps to reproduce the behavior:

PUT /_ingest/pipeline/neural-search-pipeline-v2
{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "WeliNowB6EaQJ_XFf05V",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}

And simulate the ingestion:

POST _ingest/pipeline/neural-search-pipeline-v2/_simulate
{
  "docs": [
    {
      "_index": "neural-search-index-v2",
      "_id": "1",
      "_source": {
        "category": {
          "id": 1,
          "name": {
            "en": "category 1"
          }
        }
      }
    }
  ]
}

Then user can get error like below:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "map type field [category] has non-string type, cannot process it"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "map type field [category] has non-string type, cannot process it"
      }
    }
  ]
}

What is the expected behavior? Correct embedding should be generated and inserted to the document.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Root cause is when validating map type data, not only the expected field is validated, but also the unrelated fields are been validated, in above example, "id" is been validated since it's under the category map, and it's value is integer which doesn't supported in text embedding thus the error.

zane-neo avatar Apr 10 '24 01:04 zane-neo

Should we move this issue to neural-search ?

ylwu-amzn avatar Apr 26 '24 00:04 ylwu-amzn

Yes, will move this to neural-search.

zane-neo avatar May 08 '24 14:05 zane-neo