elasticsearch
elasticsearch copied to clipboard
[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing
Build scan: https://gradle-enterprise.elastic.co/s/ygv5qoxopbkqs/tests/:x-pack:plugin:downsample:test/org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests/testCannotDownsampleWhileOtherDownsampleInProgress
Reproduction line:
./gradlew ':x-pack:plugin:downsample:test' --tests "org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests.testCannotDownsampleWhileOtherDownsampleInProgress" -Dtests.seed=FCE9B44B6379CBC3 -Dtests.locale=ar-LY -Dtests.timezone=Africa/Ceuta -Druntime.java=21 -Dtests.fips.enabled=true
Applicable branches: main
Reproduces locally?: No
Failure history:
Failure dashboard for org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests#testCannotDownsampleWhileOtherDownsampleInProgress
Failure excerpt:
org.elasticsearch.ElasticsearchException: downsample task [downsample-downsample-gsbskprihwaewu-0-351ms] failed
at __randomizedtesting.SeedInfo.seed([FCE9B44B6379CBC3:DC864329F5D5E5D6]:0)
at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:497)
at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:489)
at org.elasticsearch.persistent.PersistentTasksService$1.onNewClusterState(PersistentTasksService.java:195)
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:375)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:226)
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:561)
at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:548)
at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:506)
at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:430)
at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
Pinging @elastic/es-storage-engine (Team:StorageEngine)
Looking at the logs, there's a race between the two downsampling actions: the first manages to complete first so the second one fails as the downsample index can't be written any more:
[2024-04-25T15:59:36,589][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,596][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,597][INFO ][o.e.c.r.a.AllocationService] [node_s_0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[downsample-gsbskprihwaewu][0]]])." previous.health="YELLOW" reason="shards started [[downsample-gsbskprihwaewu][0]]"
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [0], took [0s]
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] completed
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms completed for shard [gsbskprihwaewu][0]
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] All downsampling tasks completed [1]
[2024-04-25T15:59:36,751][WARN ][o.e.p.PersistentTasksClusterService] [node_s_0] trying to update state on task downsample-downsample-gsbskprihwaewu-0-351ms with unexpected allocation id 14
[2024-04-25T15:59:36,759][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,764][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,776][ERROR][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] failed to populate downsample index. Failures: [{null=org.elasticsearch.cluster.block.ClusterBlockException: index [downsample-gsbskprihwaewu] blocked by: [FORBIDDEN/8/index write (api)];}]
[2024-04-25T15:59:36,777][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [535], took [0s]
@slobodanadamovic was the branch up-to-date? I submitted a fix for this in #107213, wonder if it's included.
@kkrik-es Yes. The branch was up-to-date. I have just merged a new changes from the main before I reported it.
Thanks for confirming, lemme reopen the original bug and mark this as a duplicate of #107210