elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

semantic_text ingestion inference integration test

Open carlosdelest opened this issue 9 months ago • 4 comments

Created an IT for bulk ingestion using semantic_text.

I've done an IT that mixes bulk operations (index, update, upsert) on an index, and tests the number of documents in the index. I've relied heavily on randomness to get test coverage.

carlosdelest avatar May 13 '24 09:05 carlosdelest

@elasticmachine run elasticsearch-ci/part-1

carlosdelest avatar May 13 '24 21:05 carlosdelest

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine avatar May 14 '24 07:05 elasticsearchmachine

Have you tested this multiple times locally? I ran it about 100 times and got a couple of failures.

org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterIT > testBulkOperations {seed=[B370AC9CED1A493:A83B5D3F6BB59398]} FAILED
    java.lang.AssertionError: Failed to index document 240: org.elasticsearch.index.mapper.DocumentParsingException: [14:1] failed to parse field [dense_field] of type [semantic_text] in document with id '240'. Preview of field's value: 'null'
        at __randomizedtesting.SeedInfo.seed([B370AC9CED1A493:A83B5D3F6BB59398]:0)
        at org.elasticsearch.test.ESTestCase.fail(ESTestCase.java:2175)
        at org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterIT.testBulkOperations(ShardBulkInferenceActionFilterIT.java:112)

        Caused by:
        org.elasticsearch.index.mapper.DocumentParsingException: [14:1] failed to parse field [dense_field] of type [semantic_text] in document with id '240'. Preview of field's value: 'null'
            at app//org.elasticsearch.index.mapper.FieldMapper.rethrowAsDocumentParsingException(FieldMapper.java:233)
            at app//org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:186)
            at app//org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:417)
            at app//org.elasticsearch.index.mapper.DocumentParser.doParseObject(DocumentParser.java:483)
            at app//org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:471)
            at app//org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:338)
            at app//org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:299)
            at app//org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:139)
            at app//org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:86)
            at app//org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:92)
            at app//org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1038)
            at app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:979)
            at app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:923)
            at app//org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:374)
            at app//org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:230)
            at app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
            at app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:300)
            at app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:151)
            at app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:79)
            at app//org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:217)
            at app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
            at app//org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
            at app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
            at app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
            at java.base@21/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            at java.base@21/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
            at java.base@21/java.lang.Thread.run(Thread.java:1583)

            Caused by:
            java.lang.IllegalArgumentException: The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0]
                at org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper$ElementType$2.checkVectorMagnitude(DenseVectorFieldMapper.java:569)
                at org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper$ElementType$2.parseKnnVectorAndIndex(DenseVectorFieldMapper.java:593)
                at org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.parseKnnVectorAndIndex(DenseVectorFieldMapper.java:1463)
                at org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.parse(DenseVectorFieldMapper.java:1456)
                at org.elasticsearch.xpack.inference.mapper.SemanticTextFieldMapper.parseCreateField(SemanticTextFieldMapper.java:245)
                at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:184)
                ... 25 more

benwtrent avatar May 14 '24 12:05 benwtrent

Have you tested this multiple times locally? I ran it about 100 times and got a couple of failures.

I did, using the @Repeat annotation - and didn't catch those. I was able to reproduce as well :( Thanks!

carlosdelest avatar May 14 '24 13:05 carlosdelest