ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[BUG] mapper_parsing_exception: failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'

Open lihuimingxs opened this issue 10 months ago • 5 comments

What is the bug? A clear and concise description of the bug.

In Opensearch 2.12.0:

By using the Bulk operation on the Java client and IndexOperation to create or update documents, Preview of field's value: 'NaN' exceptions will be encountered when calculating vectors using GPU nodes, and errors will still occur in single threads. However, writing the erroneous data again can be done normally.

And when the cluster uses CPU to calculate vectors, this problem will be solved. Therefore, I guess the reason for the error is that the GPU calculation vector is unstable, but I cannot confirm this.

Here is my Java code and detailed exception information:

Java Code:

private void sendOpenSearch(List<EntityDoc> docList) {
    try {
        List<BulkOperation> operationList = new ArrayList<>(docList.size());
        for(EntityDoc doc : docList){
            BulkOperation operation = new BulkOperation.Builder()
                    .index(new IndexOperation.Builder<>()
                            .index(opensearchProperty.getRefreshIndex())
                            .id(doc.getId())
                            .document(doc)
                            .build())
                    .build();
            operationList.add(operation);
        }

        BulkRequest bulkRequest = new BulkRequest.Builder()
                .index(opensearchProperty.getRefreshIndex())
                .operations(operationList)
                .build();

        BulkResponse response = openSearchClient.bulk(bulkRequest);
        if(response.errors()){
            response.items().forEach( item ->{
                if(null != item.error() && null != item.error().causedBy()){
                    log.error("Exception reason:{}",item.id(),item.error().causedBy().reason());
                }
            });
        }
    } catch (IOException e) {
        log.error("OpenSearch IO Exception",e);
    }
}

Exception:

2024-04-01 09:31:51,454 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-2] [] ERROR c.c.c.t.a.c.OpenSearchTalentConsumer - OpenSearch保存数据失败
org.opensearch.client.opensearch._types.OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'
        at org.opensearch.client.transport.rest_client.RestClientTransport.getHighLevelResponse(RestClientTransport.java:270)
        at org.opensearch.client.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:143)
        at org.opensearch.client.opensearch.OpenSearchClient.update(OpenSearchClient.java:1578)
        at com.ci.application.consumer.OpenSearchTalentConsumer.reSendOpensearch(OpenSearchTalentConsumer.java:97)
        at com.ci.application.consumer.OpenSearchTalentConsumer.sendOpensearch(OpenSearchTalentConsumer.java:85)
        at com.ci.application.consumer.OpenSearchTalentConsumer.consume(OpenSearchTalentConsumer.java:55)
        at jdk.internal.reflect.GeneratedMethodAccessor1421.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:171)
        at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:120)
        at org.springframework.amqp.rabbit.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:49)
        at org.springframework.amqp.rabbit.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:190)
        at org.springframework.amqp.rabbit.listener.adapter.MessagingMessageListenerAdapter.onMessage(MessagingMessageListenerAdapter.java:127)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:1552)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.actualInvokeListener(AbstractMessageListenerContainer.java:1478)
        at jdk.internal.reflect.GeneratedMethodAccessor949.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.retry.interceptor.RetryOperationsInterceptor$1.doWithRetry(RetryOperationsInterceptor.java:91)
        at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287)
        at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:180)
        at org.springframework.retry.interceptor.RetryOperationsInterceptor.invoke(RetryOperationsInterceptor.java:115)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
        at org.springframework.amqp.rabbit.listener.$Proxy354.invokeListener(Unknown Source)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:1466)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:1461)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:1410)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:870)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:854)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$1600(SimpleMessageListenerContainer.java:78)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.mainLoop(SimpleMessageListenerContainer.java:1137)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1043)
        at java.base/java.lang.Thread.run(Thread.java:829)

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What is the expected behavior? A clear and concise description of what you expected to happen.

What is your host/environment?

  • OS: Linux CentOS 7.9
  • Version: 2.12.0
  • Plugins: ml_commons

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

lihuimingxs avatar Apr 08 '24 02:04 lihuimingxs

Are you using any ml-commons feature to generate this embedding? Can you give more details how to reproduce this issue?

If you aren't using any models through ml-commons, may be we can move this issue to K-NN plugin?

dhrubo-os avatar Apr 09 '24 17:04 dhrubo-os

Are you using any ml-commons feature to generate this embedding? Can you give more details how to reproduce this issue?

If you aren't using any models through ml-commons, may be we can move this issue to K-NN plugin?

I used my custom model.

May I ask what other information do I need to provide?

lihuimingxs avatar Apr 18 '24 07:04 lihuimingxs

failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'

From the error , you are going to save 'NaN' to knn_vector field ?

ylwu-amzn avatar Apr 26 '24 00:04 ylwu-amzn

failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'

From the error , you are going to save 'NaN' to knn_vector field ?

No, my embeddingContent actually contains data, not NaN. However, the value I obtained was NaN, which led to an error in vector calculation. Yet, without any modifications, after increasing the number of client retries, this data can be saved normally.

lihuimingxs avatar Apr 26 '24 01:04 lihuimingxs

Catch All Triage - 1 2 3 4 5 6

dblock avatar Jun 24 '24 16:06 dblock