neural-search
neural-search copied to clipboard
[BUG] _bulk update request failing when using text chunking processor pipeline
Describe the bug
When performing _bulk update request while using text chunking processor I am getting {"took":0,"ingest_took":1,"errors":true,"items":[{"index":{"_index":null,"_id":null,"status":500,"error":{"type":"null_pointer_exception","reason":"Cannot invoke \"Object.toString()\" because the return value of \"java.util.Map.get(Object)\" is null"}}}]}
. There is no error when I am not using text chunking processor or when I am using regural update API.
Example request:
curl -H "Content-Type: application/json" -X POST "https://localhost:9200/_bulk" -u "admin:xxxxx" --insecure -d ' { "update": { "_id": "test", "_index": "docs-chunks"} } {"doc": {"text": "testing testing"}, "doc_as_upsert": true} '
Example response:
{"took":0,"ingest_took":1,"errors":true,"items":[{"index":{"_index":null,"_id":null,"status":500,"error":{"type":"null_pointer_exception","reason":"Cannot invoke \"Object.toString()\" because the return value of \"java.util.Map.get(Object)\" is null"}}}]}
Related component
Indexing
To Reproduce
- Deploy text model
- Create text chunking pipeline
- Create index with the text chunking pipeline as default pipeline
- Try to post bulk update request
- Error should appear
Expected behavior
Sucessfully update opensearch ducuments.
Additional Details
Plugins
[opensearch@opensearch-cluster-master-0 ~]$ bin/opensearch-plugin list opensearch-alerting opensearch-anomaly-detection opensearch-asynchronous-search opensearch-cross-cluster-replication opensearch-custom-codecs opensearch-flow-framework opensearch-geospatial opensearch-index-management opensearch-job-scheduler opensearch-knn opensearch-ml opensearch-neural-search opensearch-notifications opensearch-notifications-core opensearch-observability opensearch-performance-analyzer opensearch-reports-scheduler opensearch-security opensearch-security-analytics opensearch-skills opensearch-sql
Host/Environment (please complete the following information):
- OS: Amazon Linux
- Version: 2023
- Helm environment v2.20.0, running on k8s cluster v1.28.3
- Opensearch v2.14.0
Additional context ML model used: https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1 Text chunking pipeline:
{ "description": "A text chunking and embedding ingest pipeline", "processors": [ { "text_chunking": { "algorithm": { "fixed_token_length": { "token_limit": 350, "overlap_rate": 0.2, "tokenizer": "standard" } }, "field_map": { "text": "passage_chunk" } } }, { "text_embedding": { "model_id": "ueVVfo4Bvd-X9jaivNwl", "field_map": { "passage_chunk": "passage_embedding" } } } ] }
Index settings and mappings:
{ "settings": { "index": { "number_of_shards": 2, "number_of_replicas": 2, "knn": true, "default_pipeline": "text-chunking-embedding-ingest-pipeline", "analyze": { "max_token_count": 1000000 } } }, "mappings": { "properties": { "text": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "passage_embedding": { "type": "nested", "properties": { "knn": { "type": "knn_vector", "dimension": 384 } } } } } }