elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[CI] MultiClusterSpecIT class failing

Open elasticsearchmachine opened this issue 1 year ago • 3 comments

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:esql:qa:server:multi-clusters:v8.17.0#bwcTest" -Dtests.class="org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT" -Dtests.method="test {enrich.ShadowingWithAliasLimit0}" -Dtests.seed=BC2B9151E0C19B77 -Dtests.bwc=true -Dtests.locale=se -Dtests.timezone=America/St_Lucia -Druntime.java=22

Applicable branches: main

Reproduces locally?: N/A

Failure History: See dashboard

Failure Message:

org.elasticsearch.client.ResponseException: method [HEAD], host [http://[::1]:42973], URI [/airports], status line [HTTP/1.1 503 Service Unavailable]

Issue Reasons:

  • [main] 28 failures in class org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT (2.9% fail rate in 970 executions)
  • [main] 7 failures in step 8.17.0_bwc-snapshots (3.3% fail rate in 213 executions)
  • [main] 10 failures in step 9.0.0_bwc-snapshots (2.2% fail rate in 448 executions)
  • [main] 11 failures in step 8.16.0_bwc-snapshots (5.2% fail rate in 211 executions)
  • [main] 4 failures in pipeline elasticsearch-intake (3.4% fail rate in 119 executions)
  • [main] 19 failures in pipeline elasticsearch-pull-request (5.4% fail rate in 355 executions)

Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

elasticsearchmachine avatar Oct 18 '24 18:10 elasticsearchmachine

This has been muted on branch main

Mute Reasons:

  • [main] 24 failures in class org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT (2.5% fail rate in 961 executions)
  • [main] 8 failures in step 9.0.0_bwc-snapshots (1.8% fail rate in 443 executions)
  • [main] 5 failures in step 8.17.0_bwc-snapshots (2.4% fail rate in 209 executions)
  • [main] 11 failures in step 8.16.0_bwc-snapshots (5.2% fail rate in 211 executions)
  • [main] 2 failures in pipeline elasticsearch-intake (1.7% fail rate in 116 executions)
  • [main] 18 failures in pipeline elasticsearch-pull-request (5.1% fail rate in 353 executions)

Build Scans:

elasticsearchmachine avatar Oct 18 '24 18:10 elasticsearchmachine

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine avatar Oct 18 '24 18:10 elasticsearchmachine

This looks environmental.

fang-xing-esql avatar Oct 19 '24 01:10 fang-xing-esql

The whole test suite was muted, so I think we need to take another look and see how we can make this less flaky. If there's no other reason why this fails more often now than it did before.

There's a bunch of different errors in the test runs. Most are connection refused or node not connected - I wonder, why do we get so many disconnects of these all of a sudden?

alex-spies avatar Oct 21 '24 08:10 alex-spies

This has been fixed, unmute in https://github.com/elastic/elasticsearch/pull/115218

alex-spies avatar Oct 22 '24 10:10 alex-spies