neural-search
neural-search copied to clipboard
[BUG] error on complex types `list type field [category] has empty string, cannot process it`
Initial bug reported in https://github.com/opensearch-project/ml-commons/issues/2303
What is the bug?
I am creating a text embedding processor that creates vectors on a nested field. However, I receive illegal_argument_exception
because not all the fields in the object meet the requirement
- string
- map
- string list
Here is the explanation from the AWS support specialist
Our internal team informed me that this exception happened when the “id” under “brand” field has int value that is not supported by the text embedding processor from ingestion pipeline, and the fields inside the complex type must be of types: string, map or list.
However, I am not creating vectors on id
so I don't understand why it must follow these requirements. Is this expected behaviour or is this a bug?
How can one reproduce the bug? Steps to reproduce the behavior:
- create ingest pipeline
PUT /_ingest/pipeline/neural-search-pipeline-v2
{
"description": "An example neural search pipeline",
"processors": [
{
"text_embedding": {
"model_id": "WeliNowB6EaQJ_XFf05V",
"field_map": {
"category": {
"name": {
"en": "category_name_vector"
}
}
}
}
}
]
}
- simulate ingest pipeline
POST _ingest/pipeline/neural-search-pipeline-v2/_simulate
{
"docs": [
{
"_index": "neural-search-index-v2",
"_id": "1",
"_source": {
"category": {
"id": 1,
"name": {
"en": "category 1"
}
}
}
}
]
}
What is the expected behavior? should create vectors on category name
{
"docs": [
{
"doc": {
"_index": "neural-search-index-v2",
"_id": "1",
"_source": {
"category": {
"name": {
"category_name_vector": [
0.019107267,
-0.029297447,
0.0070927013,
-0.022105217,
...
],
"en": "category 1"
},
"id": 1
}
},
"_ingest": {
"timestamp": "2024-01-08T17:59:39.543401762Z"
}
}
}
]
}
What is your host/environment?
- OS: AWS Opensearch Service Managed Cluster
- Version 2.11
Do you have any screenshots?
{
"failures": {
"index": "neural-search-index-v2",
"id": "5302821",
"cause": {
"type": "illegal_argument_exception",
"reason": "list type field [category] has empty string, cannot process it"
},
"status": 400
},
...
}
Do you have any additional context?
invalid doc
{
"brand": {
"id": 123, // cannot be integer
"description": {
"en": "en description female",
"fr": "" // cannot be empty string
}
...
},
"category": {
"id": "123", // valid string
"sizes": [
"XS",
"XL",
"", // elements in list cannot be empty strings
123 // elements in list cannot be integers
...
]
}
}
valid doc
{
"brand": {
"id": "123",
"description": {
"en": "en description"
}
...
},
"category": {
"id": "123",
"sizes": [ ] // empty list is valid
"description": {
// empty object is valid
}
}
}