ml-commons
ml-commons copied to clipboard
[BUG] Incorrect validation logic for map type in xxxProcessor
What is the bug? When user use map type configuration in several processors, the validation can fail since validation is been done on extra fields in that map. How can one reproduce the bug? Steps to reproduce the behavior:
PUT /_ingest/pipeline/neural-search-pipeline-v2
{
"description": "An example neural search pipeline",
"processors": [
{
"text_embedding": {
"model_id": "WeliNowB6EaQJ_XFf05V",
"field_map": {
"category": {
"name": {
"en": "category_name_vector"
}
}
}
}
}
]
}
And simulate the ingestion:
POST _ingest/pipeline/neural-search-pipeline-v2/_simulate
{
"docs": [
{
"_index": "neural-search-index-v2",
"_id": "1",
"_source": {
"category": {
"id": 1,
"name": {
"en": "category 1"
}
}
}
}
]
}
Then user can get error like below:
{
"docs": [
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "map type field [category] has non-string type, cannot process it"
}
],
"type": "illegal_argument_exception",
"reason": "map type field [category] has non-string type, cannot process it"
}
}
]
}
What is the expected behavior? Correct embedding should be generated and inserted to the document.
What is your host/environment?
- OS: [e.g. iOS]
- Version [e.g. 22]
- Plugins
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Root cause is when validating map type data, not only the expected field is validated, but also the unrelated fields are been validated, in above example, "id" is been validated since it's under the category
map, and it's value is integer which doesn't supported in text embedding thus the error.
Should we move this issue to neural-search ?
Yes, will move this to neural-search.