elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[CI] RestoreTemplateWithMatchOnlyTextMapperIT test failing

Open gmarouli opened this issue 10 months ago • 8 comments

While working on #107281 we noticed that when we were running testPartialRestoreSnapshotThatIncludesDataStream and we would include the global state, we would get the assertion error that the cluster state size does not match.

In this https://github.com/elastic/elasticsearch/pull/107514 we introduce a test that demonstrates this issue without any other changes. Looking for other similar failures (#59140), we believe the reason is that there is something being mutated on the local node after the restoration. But this is just an assumption.

We are not sure if this is an issue in restore or in mapping, that's why we added both labels. Please feel free to change this accordingly.

Build scan: https://gradle-enterprise.elastic.co/s/ceaa7hizih32e/tests/:modules:mapper-extras:internalClusterTest/org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT/test

Reproduction line:

./gradlew ':modules:mapper-extras:internalClusterTest' --tests "org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT.test" -Dtests.seed=A1FA43D50CA25035 -Dtests.locale=uk -Dtests.timezone=America/Yellowknife -Druntime.java=21

Applicable branches: main

Reproduces locally?: Yes

Failure history: Failure dashboard for org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT#test

Failure excerpt:

java.lang.AssertionError: cluster state size does not match expected:<4659> but was:<4669>

  at __randomizedtesting.SeedInfo.seed([A1FA43D50CA25035:29AE7C0FA25E3DCD]:0)
  at org.junit.Assert.fail(Assert.java:89)
  at org.junit.Assert.failNotEquals(Assert.java:835)
  at org.junit.Assert.assertEquals(Assert.java:647)
  at org.elasticsearch.test.ESIntegTestCase.ensureClusterStateConsistency(ESIntegTestCase.java:1235)
  at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:586)
  at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2331)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

gmarouli avatar Apr 16 '24 09:04 gmarouli

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine avatar Apr 16 '24 09:04 elasticsearchmachine

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine avatar Apr 16 '24 09:04 elasticsearchmachine

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine avatar Apr 16 '24 09:04 elasticsearchmachine

@gmarouli it seems that the test passes you passed wrap the mappings inside _doc.

{
  "_doc": {
    "properties": {
      "@timestamp": {
        "format": "date_optional_time||epoch_millis",
        "type": "date"
      },
      "flag": {
        "type": "boolean"
      },
      "message": {
        "type": "match_only_text"
      }
    }
  }
}

arteam avatar May 03 '24 13:05 arteam

If you store the compressed mapping in Template without _doc, for some reason it reads back the compressed mapping wrapped in _doc.

arteam avatar May 03 '24 13:05 arteam

https://github.com/elastic/elasticsearch/pull/78746 seems to be related to this issue

arteam avatar May 06 '24 09:05 arteam

Looks like the core issue is MetadataIndexTemplateService#wrapMappingsIfNecessary and Template#reduceMappings alter the mappings, that's why we end up with templates with different sizes.

arteam avatar May 06 '24 09:05 arteam

Another way to fix the test is to use out.writeString(this.mappings.string()); and CompressedXContent.fromJSON(in.readString()) to store/read mappings but it defeats the purpose of storing them compressed.

arteam avatar May 06 '24 09:05 arteam

@arteam thanks for looking into this! I am still a bit baffled why this got triggered by using the type "type": "match_only_text" on non template mapping as well. I will see if I can create a different example.

gmarouli avatar May 08 '24 14:05 gmarouli

Hey, i was looking at this because it is tagged with storage engine but i don't think RestoreTemplateWithMatchOnlyTextMapperIT exists. Am i missing something?

lkts avatar May 29 '24 21:05 lkts

@lkts I believe that test was added as a reproducer of the issue in #107514

arteam avatar May 30 '24 10:05 arteam

Test from #107514 fails for me even with

{
  "_doc": {
    "properties": {
      "@timestamp": {
        "format": "date_optional_time||epoch_millis",
        "type": "date"
      },
      "flag": {
        "type": "boolean"
      },
      "message": {
        "type": "match_only_text"
      }
    }
  }
}

It passes ~10% of the time so there must be some randomness somewhere?

lkts avatar May 31 '24 17:05 lkts

I'll remove storage engine team from here since the issue seems to be in MetadataIndexTemplateService as @arteam mentioned above and logically mappings are idential.

lkts avatar Jul 02 '24 21:07 lkts

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine avatar Jul 02 '24 21:07 elasticsearchmachine