elasticsearch
elasticsearch copied to clipboard
[CI] IndexingIT testIndexing {upgradedNodes=2} failing
Build scan: https://gradle-enterprise.elastic.co/s/ened3yzjpisnq/tests/:qa:rolling-upgrade:v8.10.3%23bwcTest/org.elasticsearch.upgrades.IndexingIT/testIndexing%20%7BupgradedNodes=2%7D
Reproduction line:
./gradlew ':qa:rolling-upgrade:v8.10.3#bwcTest' -Dtests.class="org.elasticsearch.upgrades.IndexingIT" -Dtests.method="testIndexing {upgradedNodes=2}" -Dtests.seed=53F4AE70D163BC85 -Dtests.bwc=true -Dtests.locale=ar-AE -Dtests.timezone=Africa/Mogadishu -Druntime.java=21
Applicable branches: main
Reproduces locally?: Didn't try
Failure history:
Failure dashboard for org.elasticsearch.upgrades.IndexingIT#testIndexing {upgradedNodes=2}
Failure excerpt:
org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:34545], URI [/_cluster/health?wait_for_nodes=3&wait_for_status=yellow], status line [HTTP/1.1 408 Request Timeout]
{"cluster_name":"test-cluster","status":"red","timed_out":true,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":5,"active_shards":10,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":83.33333333333334}
at __randomizedtesting.SeedInfo.seed([53F4AE70D163BC85:B72EE179F716FF0E]:0)
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:317)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:292)
at org.elasticsearch.upgrades.IndexingIT.testIndexing(IndexingIT.java:63)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.lang.reflect.Method.invoke(Method.java:580)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:47)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.lang.Thread.run(Thread.java:1583)
Pinging @elastic/es-distributed (Team:Distributed)
Hm actually this whole suite seems pretty damn flaky right now:
See e.g. latest failure: https://gradle-enterprise.elastic.co/s/wdzyszxt4ckmw
Looks to be a TSDB issue:
[2024-02-13T11:42:27,715][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.12.2-0] fatal error in thread [elasticsearch[v8.12.2-0][generic][T#5]], exiting
java.lang.AssertionError: unexpected failure while replicating translog entry
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$4(RecoveryTarget.java:463) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:270) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:432) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.performTranslogOps(PeerRecoveryTargetService.java:611) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.handleRequest(PeerRecoveryTargetService.java:565) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.handleRequest(PeerRecoveryTargetService.java:557) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:644) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRequestHandler.messageReceived(PeerRecoveryTargetService.java:631) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.transport.InboundHandler.doHandleRequest(InboundHandler.java:288) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:301) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Caused by: org.elasticsearch.index.mapper.DocumentParsingException: [1:124] failed to parse: _id must be unset or set to [AAAAAKuNxDxOOw8DAAABhW0z9gA] but was [4tlSJauNxDxOOw8DAAABhW0z9gA] because [locations] i
at org.elasticsearch.index.mapper.DocumentParser.wrapInDocumentParsingException(DocumentParser.java:246) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:153) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1031) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:970) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1944) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1931) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$4(RecoveryTarget.java:457) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
... 15 more
Caused by: java.lang.IllegalArgumentException: _id must be unset or set to [AAAAAKuNxDxOOw8DAAABhW0z9gA] but was [4tlSJauNxDxOOw8DAAABhW0z9gA] because [locations] is in time_series mode
at org.elasticsearch.index.mapper.TsidExtractingIdFieldMapper.createField(TsidExtractingIdFieldMapper.java:120) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.TimeSeriesIdFieldMapper.postParse(TimeSeriesIdFieldMapper.java:140) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:150) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1031) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:970) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1944) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyTranslogOperation(IndexShard.java:1931) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$4(RecoveryTarget.java:457) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
... 15 more
Pinging @elastic/es-storage-engine (Team:StorageEngine)
Caused by: java.lang.IllegalArgumentException: _id must be unset or set to [AAAAAKuNxDxOOw8DAAABhW0z9gA] but was [4tlSJauNxDxOOw8DAAABhW0z9gA] because [locations] is in time_series mode
I think locations
should be an index, but I don't see this index being created in one of the tests in this qa module.
A theory that I tried verify is that one of the recent changes that were made somehow creates a slightly different id for tsdb documents. By indexing a document in a pre 8.13 version and checking the id is same when updating to a 8.13/8.14, but I've been unsuccessful so far.
I think the failure with the locations
index occurred in the mixed cluster QA module: https://github.com/elastic/elasticsearch/blob/ac574acca98d34838cd28ffee547bfdd90e00885/rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/tsdb/130_position_fields.yml#L9
The failure occurred in the rolling upgrade module was Caused by: org.elasticsearch.index.mapper.DocumentParsingException: [1:62] failed to parse: _id must be unset or set to [AAAAAECsHS75xpHXAAABdrs-cAA] but was [itaMgECsHS75xpHXAAABdrs-cAA] because [tsdb] is in time_series mode
. An _id of AAAAAECsHS75xpHXAAABdrs-cAA
can be generated from IndexingIT without dimensions and with a timestamp of 1609459200000. I am still working on connecting these failure events.
See e.g. latest failure: https://gradle-enterprise.elastic.co/s/wdzyszxt4ckmw
This build scan from an unmerged PR (https://github.com/elastic/elasticsearch/pull/105073), where we know it has some issues. I've relabelled this issue.
I thin Nhat is correct...I can spot the pattern "AAAAA" at the beginning of the id which means that a buffer including only 0-value bytes has been Base64 encoded. So probably the issue is caused by having no dimensions.
A lot more failures, on 8.10.4: https://gradle-enterprise.elastic.co/s/4w5djc46sngzk
The issue was initially reported due to failures from an unmerged PR. Therefore, I am closing this issue.