redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (GetParam() = false) in `gtest_cluster_cloud_metadata_rpfixture.WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState`

Open andijcr opened this issue 2 months ago • 5 comments

https://buildkite.com/redpanda/redpanda/builds/47610#018ec863-93f3-4657-8144-7da19cead72b

gtest_cluster_cloud_metadata_rpfixture

WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState` where GetParam() = false

release build

the failure should be on dev: the originating pr does not modify any c++ code

JIRA Link: CORE-2338

andijcr avatar Apr 11 '24 19:04 andijcr

@andijcr would appreciate it if you used the CI failure template

dotnwat avatar Apr 12 '24 03:04 dotnwat

@andijcr would appreciate it if you used the CI failure template

@dotnwat Do we have a template for fixture test failures? The one we have is for ducktape and it's not clear what to write and where

andijcr avatar Apr 12 '24 10:04 andijcr

We should make one because these are fairly common. The problem with the ducktape one is that it breaks pandatriage.

rockwotj avatar Apr 12 '24 16:04 rockwotj

@andijcr good point I was reading this too fast and thought it was ducktape :)

dotnwat avatar Apr 12 '24 19:04 dotnwat

73 bytes)}, writer=nullptr, cache=nullptr, compaction_index:nullopt, closed=0, tombstone=0, index={file:test.dir_1712765242/redpanda/kvstore/0_0/0-0-v1.base_index, offsets:0, index:{header_bitflags:0, base_offset:0, max_offset:38, base_timestamp:{timestamp: 1712765243573}, max_timestamp:{timestamp: 1712765243842}, batch_timestamps_are_monotonic:1, with_offset:false, non_data_timestamps:0, broker_timestamp:{{timestamp: 1712765243842}}, num_compactible_records_appended:{39}, index(1,1,1)}, step:32768, needs_persistence:0}}
_bk;t=1712765258285unknown file: Failure
_bk;t=1712765258285C++ exception with description "configuration property cloud_storage_secret_key is not set" thrown in the test body.
_bk;t=1712765258285
_bk;t=1712765258285[  FAILED  ] WithLeadershipChanges/ClusterRecoveryBackendLeadershipParamTest.TestRecoveryControllerState/0, where GetParam() = false (1097 ms)

This looks like a test bug. We reset the shard local config immediately after restarting the application, but before the restore completes. This causes a race where by the time the cluster restore attempts to perform topic recovery, the cluster configs for cloud storage have been wiped.

andrwng avatar Apr 27 '24 01:04 andrwng