elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

[CI] SearchTransportTelemetryTests testSearchTransportMetricsQueryThenFetch failing

Open DaveCTurner opened this issue 2 years ago • 8 comments

CI Link

Only failed locally, sorry no build scan

Repro line

':server:test' --tests "org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests.testSearchTransportMetricsQueryThenFetch" -Dtests.seed=A7DDD2883CF9F74D:5932D02B45F0F45A -Dtests.locale=is-IS -Dtests.timezone=Asia/Dacca -Druntime.java=21

Does it reproduce?

No

Applicable branches

main

Failure history

https://es-delivery-stats.elastic.dev/app/r/s/9tCmJ

Failure excerpt

No failures in CI it seems, but failed on a local run:

org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests > testSearchTransportMetricsQueryThenFetch FAILED
    java.lang.AssertionError: Values should be different. Actual: 0
        at __randomizedtesting.SeedInfo.seed([A7DDD2883CF9F74D:5932D02B45F0F45A]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failEquals(Assert.java:187)
        at org.junit.Assert.assertNotEquals(Assert.java:201)
        at org.junit.Assert.assertNotEquals(Assert.java:213)
        at org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests.testSearchTransportMetricsQueryThenFetch(SearchTransportTelemetryTests.java:81)
  1> [2024-01-10T07:14:05,007][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] before test
  1> [2024-01-10T07:14:05,013][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: setting up test
  1> [2024-01-10T07:14:05,028][INFO ][o.e.t.InternalTestCluster] [testSearchTransportMetricsQueryThenFetch] Setup InternalTestCluster [TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster] with seed [E245A1F88A1CE78F] using [0] dedicated masters, [1] (data) nodes and [0] coord only nodes (master nodes are [auto-managed])
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] version[8.13.0-SNAPSHOT], pid[1723656], build[unknown/unknown/unknown], OS[Linux/6.2.0-1020-gcp/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/21.0.1/21.0.1+12-29]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] JVM home [/home/davidturner/.gradle/jdks/oracle_corporation-21-amd64-linux/jdk-21.0.1], using bundled JDK [false]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] JVM arguments [-Des.insecure_network_trace_enabled=true, -Des.scripting.update.ctx_in_params=false, -Des.search.rewrite_sort=true, -Des.transport.cname_in_publish_address=true, -Dgradle.dist.lib=/home/davidturner/.gradle/wrapper/dists/gradle-8.5-all/3zlzzgtsutfj0pbojr50n2l7z/gradle-8.5/lib, -Dgradle.user.home=/home/davidturner/.gradle, -Dgradle.worker.jar=/home/davidturner/.gradle/caches/8.5/workerMain/gradle-worker.jar, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djava.locale.providers=SPI,COMPAT, -Djava.security.manager=allow, -Djna.nosys=true, -Dorg.gradle.internal.worker.tmpdir=/home/davidturner/src/elasticsearch/server/build/tmp/test/work, -Dorg.gradle.native=false, -Dtests.artifact=server, -Dtests.gradle=true, -Dtests.gradle-loop-iteration=19, -Dtests.logger.level=WARN, -Dtests.security.manager=true, -Dtests.seed=A7DDD2883CF9F74D, -Dtests.task=:server:test, --add-opens=java.base/java.security.cert=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/java.net=ALL-UNNAMED, --add-opens=java.base/javax.net.ssl=ALL-UNNAMED, --add-opens=java.base/java.nio.file=ALL-UNNAMED, --add-opens=java.base/java.time=ALL-UNNAMED, --add-opens=java.management/java.lang.management=ALL-UNNAMED, -XX:+HeapDumpOnOutOfMemoryError, -esa, -XX:HeapDumpPath=/home/davidturner/src/elasticsearch/server/build/heapdump, -Xms512m, -Xmx512m, -Dfile.encoding=UTF-8, -Djava.io.tmpdir=/home/davidturner/src/elasticsearch/server/build/testrun/test/temp, -Duser.country, -Duser.language=en, -Duser.variant, -ea]
  1> [2024-01-10T07:14:05,035][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] Default Locale [is_IS]
  1> [2024-01-10T07:14:05,036][WARN ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] version [8.13.0-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
  1> [2024-01-10T07:14:05,036][WARN ][o.e.d.n.Node             ] [testSearchTransportMetricsQueryThenFetch] data_stream.dataset="deprecation.elasticsearch" data_stream.namespace="default" data_stream.type="logs" elasticsearch.event.category="settings" event.code="shared-data-path" message="setting [path.shared_data] is deprecated and will be removed in a future release"
  1> [2024-01-10T07:14:05,232][INFO ][o.e.e.NodeEnvironment    ] [testSearchTransportMetricsQueryThenFetch] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [224.8gb], net total_space [242.2gb], types [ext4]
  1> [2024-01-10T07:14:05,233][INFO ][o.e.e.NodeEnvironment    ] [testSearchTransportMetricsQueryThenFetch] heap size [512mb], compressed ordinary object pointers [true]
  1> [2024-01-10T07:14:05,236][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] node name [node_t0], node ID [9pmAQkDpT2-BAiFU_LPorA], cluster name [TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster], roles [master, data_warm, data_content, transform, data_hot, ml, data_frozen, ingest, data_cold, data, remote_cluster_client]
  1> [2024-01-10T07:14:05,328][INFO ][o.e.f.FeatureService     ] [testSearchTransportMetricsQueryThenFetch] Registered local node features [features_supported]
  1> [2024-01-10T07:14:05,350][INFO ][o.e.i.r.RecoverySettings ] [testSearchTransportMetricsQueryThenFetch] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]
  1> [2024-01-10T07:14:05,352][INFO ][o.e.d.DiscoveryModule    ] [testSearchTransportMetricsQueryThenFetch] using discovery type [multi-node] and seed hosts providers [settings, file]
  1> [2024-01-10T07:14:05,569][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] initialized
  1> [2024-01-10T07:14:05,596][INFO ][o.e.n.Node               ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] starting ...
  1> [2024-01-10T07:14:05,676][INFO ][o.e.t.TransportService   ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] publish_address {127.0.0.1:28181}, bound_addresses {[::1]:28181}, {127.0.0.1:28181}
  1> [2024-01-10T07:14:05,724][INFO ][o.e.c.c.ClusterBootstrapService] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to [node_t0]
  1> [2024-01-10T07:14:05,747][INFO ][o.e.c.c.Coordinator      ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] setting initial configuration to VotingConfiguration{9pmAQkDpT2-BAiFU_LPorA}
  1> [2024-01-10T07:14:05,845][INFO ][o.e.c.s.MasterService    ] [node_t0] elected-as-master ([1] nodes joined in term 1)[_FINISH_ELECTION_, {node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000} completing election], term: 1, version: 1, delta: master node changed {previous [], current [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}]}
  1> [2024-01-10T07:14:05,880][INFO ][o.e.c.c.CoordinationState] [node_t0] cluster UUID set to [I8Rrx0LzSuqGvOAS0lB8Jg]
  1> [2024-01-10T07:14:05,900][INFO ][o.e.c.s.ClusterApplierService] [node_t0] master node changed {previous [], current [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}]}, term: 1, version: 1, reason: Publication{term=1, version=1}
  1> [2024-01-10T07:14:05,930][INFO ][o.e.c.f.AbstractFileWatchingService] [node_t0] starting file watcher ...
  1> [2024-01-10T07:14:05,970][INFO ][o.e.c.c.NodeJoinExecutor ] [node_t0] node-join: [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}] with reason [completing election]
  1> [2024-01-10T07:14:05,972][INFO ][o.e.c.f.AbstractFileWatchingService] [[elasticsearch[file-watcher[/home/davidturner/src/elasticsearch/server/build/testrun/test/temp/org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests_A7DDD2883CF9F74D-001/tempDir-002/config/operator/settings.json]]]] file settings service up and running [tid=1396]
  1> [2024-01-10T07:14:05,998][INFO ][o.e.n.Node               ] [[test_TEST-TEST_WORKER_VM=[1142]-CLUSTER_SEED=[-2142127958903691377]-HASH=[B8C2190D1CAC]-cluster[T#1]]] started {node_t0}{9pmAQkDpT2-BAiFU_LPorA}{a2O6zoWDQFiQrNroIsfMsQ}{node_t0}{127.0.0.1}{127.0.0.1:28181}{cdfhilmrstw}{8.13.0}{7000099-8501000}
  1> [2024-01-10T07:14:06,024][INFO ][o.e.g.GatewayService     ] [node_t0] recovered [0] indices into cluster_state
  1> [2024-01-10T07:14:06,040][INFO ][o.e.h.n.s.HealthNodeTaskExecutor] [node_t0] Node [{node_t0}{9pmAQkDpT2-BAiFU_LPorA}] is selected as the current health node.
  1> [2024-01-10T07:14:06,159][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_t0] adding template [random_index_template] for index patterns [*]
  1> [2024-01-10T07:14:06,181][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: all set up test
  1> [2024-01-10T07:14:06,188][INFO ][o.e.c.m.MetadataCreateIndexService] [node_t0] [test2] creating index, cause [api], templates [random_index_template], shards [3]/[0]
  1> [2024-01-10T07:14:06,478][INFO ][o.e.c.r.a.AllocationService] [node_t0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test2][0]]])." previous.health="YELLOW" reason="shards started [[test2][0]]"
  1> [2024-01-10T07:14:06,502][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] Index [1] docs async: [true] bulk: [false]
  1> [2024-01-10T07:14:06,613][INFO ][o.e.c.m.MetadataMappingService] [node_t0] [test2/LloPpu4jSU6HaBIMKMpbyA] create_mapping
  1> [2024-01-10T07:14:06,994][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: cleaning up after test
  1> [2024-01-10T07:14:07,173][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_t0] [test2/LloPpu4jSU6HaBIMKMpbyA] deleting index
  1> [2024-01-10T07:14:07,203][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][1] start check index
  1> [2024-01-10T07:14:07,214][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][1] end check index
  1> [2024-01-10T07:14:07,221][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][0] start check index
  1> [2024-01-10T07:14:07,223][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][0] end check index
  1> [2024-01-10T07:14:07,227][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][2] start check index
  1> [2024-01-10T07:14:07,230][INFO ][o.e.t.s.M.Listener       ] [node_t0] [test2][2] end check index
  1> [2024-01-10T07:14:07,246][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_t0] removing template [random_index_template]
  1> [2024-01-10T07:14:07,261][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] stopping ...
  1> [2024-01-10T07:14:07,264][INFO ][o.e.c.f.AbstractFileWatchingService] [[elasticsearch[file-watcher[/home/davidturner/src/elasticsearch/server/build/testrun/test/temp/org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests_A7DDD2883CF9F74D-001/tempDir-002/config/operator/settings.json]]]] shutting down watcher thread
  1> [2024-01-10T07:14:07,270][INFO ][o.e.c.f.AbstractFileWatchingService] [testSearchTransportMetricsQueryThenFetch] watcher service stopped
  1> [2024-01-10T07:14:07,282][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] stopped
  1> [2024-01-10T07:14:07,282][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] closing ...
  1> [2024-01-10T07:14:07,286][INFO ][o.e.n.Node               ] [testSearchTransportMetricsQueryThenFetch] closed
  1> [2024-01-10T07:14:07,291][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] [SearchTransportTelemetryTests#testSearchTransportMetricsQueryThenFetch]: cleaned up after test
  1> [2024-01-10T07:14:07,292][INFO ][o.e.s.T.SearchTransportTelemetryTests] [testSearchTransportMetricsQueryThenFetch] after test

DaveCTurner avatar Jan 10 '24 08:01 DaveCTurner

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine avatar Jan 10 '24 08:01 elasticsearchmachine

@JVerwolf looks like most of SearchTransportTelemetryTests are flaky and muted. Mind determining impact and digging in?

benwtrent avatar Jan 10 '24 12:01 benwtrent

Here's a build scan of this failing on main: https://gradle-enterprise.elastic.co/s/c2pzftpx5sxz2/tests/:server:test/org.elasticsearch.search.TelemetryMetrics.SearchTransportTelemetryTests/testSearchTransportMetricsQueryThenFetch.

felixbarny avatar Feb 20 '24 09:02 felixbarny

Another instance on a PR, it seems the original test here was not muted so I've done so in ad47ffbb2d5

rjernst avatar Mar 12 '24 22:03 rjernst

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine avatar Jul 17 '24 17:07 elasticsearchmachine

@JVerwolf , please provide a risk assessment on this muted test.

julio-santana avatar Oct 29 '24 22:10 julio-santana

Labeling this as low risk due to it targeting transport telemetry

andreidan avatar Oct 30 '24 16:10 andreidan

Agreed, @andreidan. I spent a number of cycles trying to reproduce, but there must be some logic that causes various transport actions to be called differently depending on the random cluster topology (e.g. via an optimization).

IIR, @carlosdelest was adding telemetry that will replace this. When that happens, these can be deleted.

JVerwolf avatar Oct 30 '24 17:10 JVerwolf

IIR, @carlosdelest was adding telemetry that will replace this. When that happens, these can be deleted.

@JVerwolf correct, these are removed as part of https://github.com/elastic/elasticsearch/pull/113194

carlosdelest avatar Nov 05 '24 07:11 carlosdelest

Closed by https://github.com/elastic/elasticsearch/pull/113194

JVerwolf avatar Jan 09 '25 20:01 JVerwolf