neural-search icon indicating copy to clipboard operation
neural-search copied to clipboard

[BUG] Updating a single document fails using the Update API

Open juntezhang opened this issue 2 years ago • 4 comments

What is the bug?

A clear and concise description of the bug.

The bug has already been reported on the forums, but creating a bug report on GH for better visibility and tracing.

After successfully indexing (creating) a new document in the index using the neural ingest pipeline, updating the created document with the Update API fails.

This is the error that is returned:

{
    "error": {
        "root_cause": [
            {
                "type": "mapper_parsing_exception",
                "reason": "failed to parse field [Field_vectorized] of type [knn_vector] in document with id '9'. Preview of field's value: 'null'"
            }
        ],
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [Field_vectorized] of type [knn_vector] in document with id '9'. Preview of field's value: 'null'",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Vector dimension mismatch. Expected: 384, Given: 768"
        }
    },
    "status": 400
}

How can one reproduce the bug?

Steps to reproduce the behavior.

Complete the Sease neural search plugin tutorial.

Then we can update an existing document like this:

POST http://localhost:9200/<INDEX>/_update/9

With the request body like this:

{ "doc": { "Body":"The New American frontier, also known as the Old West, popularly known as the Wild West, encompasses the geography, history, folklore, and culture associated with the forward wave of American expansion in mainland North America that began with European colonial settlements in the early 17th century and ended with the admission of the last few contiguous western territories as states in 1912. This era of massive migration and settlement was particularly encouraged by President Thomas Jefferson following the Louisiana Purchase, giving rise to the expansionist attitude known as \"Manifest Destiny\" and the historians' \"Frontier Thesis\". The legends, historical events and folklore of the American frontier have embedded themselves into United States culture so much so that the Old West, and the Western genre of media specifically, has become one of the defining periods of American national identity."}}

But it returns the error as shared above.

What is the expected behavior?

A clear and concise description of what you expected to happen.

The expected behavior is that the document gets indexed without errors.

What is your host/environment?

Operating system, version.

MacOS 13.3, but running OpenSearch on Docker with Ubuntu.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

N/A

Do you have any additional context?

Add any other context about the problem.

I am happy to contribute to a solution.

juntezhang avatar Jul 13 '23 16:07 juntezhang

@juntezhang is this similar to this issue: https://github.com/opensearch-project/neural-search/issues/213

navneet1v avatar Jul 15 '23 02:07 navneet1v

It's related, but not the same. We can accept that updates don't work with the ingest pipeline. But doing a single call with the Update API is another case. Now we can only use the Index API, so we have to always send the whole document. Updating by script is not possible then.

juntezhang avatar Jul 18 '23 06:07 juntezhang

@juntezhang This can be a feature gap where the ingest pipeline is not workin on Update API call. I would recommend cutting a github issue to OpenSearch Core repo. https://github.com/opensearch-project/OpenSearch

navneet1v avatar Aug 16 '23 19:08 navneet1v

@juntezhang This can be a feature gap where the ingest pipeline is not workin on Update API call. I would recommend cutting a github issue to OpenSearch Core repo. https://github.com/opensearch-project/OpenSearch

Thanks, then it's good to know this limitation. We have worked around this problem by using the Index API instead for updates, but when it's necessary to use the Update API, because when updating by script, then add the query parameter pipeline=_none to the Update API. This allows you to keep updating docs by script by bypassing the neural search pipeline.

I wonder whether it would be possible to extend the plugin to not only rely on the Ingest Pipeline?

juntezhang avatar Aug 17 '23 09:08 juntezhang

This is a same issue with this: https://github.com/opensearch-project/neural-search/issues/207, and it's already fixed, closing this issue now.

zane-neo avatar Sep 29 '24 09:09 zane-neo