neural-search [BUG] _bulk update request failing when using text chunking processor pipeline

[BUG] _bulk update request failing when using text chunking processor pipeline

Open janmederly opened this issue 8 months ago • 15 comments

Describe the bug

When performing _bulk update request while using text chunking processor I am getting {"took":0,"ingest_took":1,"errors":true,"items":[{"index":{"_index":null,"_id":null,"status":500,"error":{"type":"null_pointer_exception","reason":"Cannot invoke \"Object.toString()\" because the return value of \"java.util.Map.get(Object)\" is null"}}}]}. There is no error when I am not using text chunking processor or when I am using regural update API.

Example request:

curl -H "Content-Type: application/json" -X POST "https://localhost:9200/_bulk" -u "admin:xxxxx" --insecure -d ' { "update": { "_id": "test", "_index": "docs-chunks"} } {"doc": {"text": "testing testing"}, "doc_as_upsert": true} ' Example response:

{"took":0,"ingest_took":1,"errors":true,"items":[{"index":{"_index":null,"_id":null,"status":500,"error":{"type":"null_pointer_exception","reason":"Cannot invoke \"Object.toString()\" because the return value of \"java.util.Map.get(Object)\" is null"}}}]}

Related component

Indexing

To Reproduce

Deploy text model
Create text chunking pipeline
Create index with the text chunking pipeline as default pipeline
Try to post bulk update request
Error should appear

Expected behavior

Sucessfully update opensearch ducuments.

Additional Details

Plugins [opensearch@opensearch-cluster-master-0 ~]$ bin/opensearch-plugin list opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql

Host/Environment (please complete the following information):

OS: Amazon Linux
Version: 2023
Helm environment v2.20.0, running on k8s cluster v1.28.3
Opensearch v2.14.0

Additional context ML model used: https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 Text chunking pipeline:

{ "description": "A text chunking and embedding ingest pipeline", "processors": [ { "text_chunking": { "algorithm": { "fixed_token_length": { "token_limit": 350, "overlap_rate": 0.2, "tokenizer": "standard" } }, "field_map": { "text": "passage_chunk" } } }, { "text_embedding": { "model_id": "ueVVfo4Bvd-X9jaivNwl", "field_map": { "passage_chunk": "passage_embedding" } } } ] }

Index settings and mappings:

{ "settings": { "index": { "number_of_shards": 2, "number_of_replicas": 2, "knn": true, "default_pipeline": "text-chunking-embedding-ingest-pipeline", "analyze": { "max_token_count": 1000000 } } }, "mappings": { "properties": { "text": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "passage_embedding": { "type": "nested", "properties": { "knn": { "type": "knn_vector", "dimension": 384 } } } } } }

Jun 19 '24 13:06 janmederly

neural-search neural-search copied to clipboard

[BUG] _bulk update request failing when using text chunking processor pipeline

Describe the bug

Related component

To Reproduce

Expected behavior

Additional Details

neural-search
neural-search copied to clipboard