cockroach
cockroach copied to clipboard
server: TestCachedSettingsServerRestart failed
server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ d9b0e5f8cefa99bdcc217f6be790d424c603031c:
=== RUN TestCachedSettingsServerRestart
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart1780436221
test_log_scope.go:79: use -show-logs to present logs inline
settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[186 121 60 169 10 38 4 116 114 117 101 24 178 155 218 228 12 160 211 211 158 2 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[245 212 250 113 10 38 10 18 8 8 23 16 1 24 0 32 0 24 178 155 218 228 12 144 162 253 235 3 22 1 109] Timestamp:0,0}}]
Actual: [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[167 136 241 49 10 38 36 100 99 99 49 49 57 51 100 45 49 52 98 52 45 52 49 56 50 45 98 98 55 49 45 57 102 98 56 99 99 100 54 49 54 56 97 24 178 155 218 228 12 144 225 129 180 5 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[186 121 60 169 10 38 4 116 114 117 101 24 178 155 218 228 12 160 211 211 158 2 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[245 212 250 113 10 38 10 18 8 8 23 16 1 24 0 32 0 24 178 155 218 228 12 144 162 253 235 3 22 1 109] Timestamp:0,0}}]
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart1780436221
--- FAIL: TestCachedSettingsServerRestart (230.54s)
Parameters:
TAGS=bazel,gss,race
This test on roachdash | Improve this report!
Jira issue: CRDB-38882
server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 4534017ff77b216dcad4301f13d8ee13cf7fd423:
=== RUN TestCachedSettingsServerRestart
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3791630463
test_log_scope.go:79: use -show-logs to present logs inline
settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[90 19 170 150 10 38 4 116 114 117 101 24 128 138 214 231 12 208 237 197 155 3 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[24 40 96 157 10 38 10 18 8 8 23 16 1 24 0 32 0 24 128 138 214 231 12 208 165 130 224 4 22 1 109] Timestamp:0,0}}]
Actual: [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[28 1 185 15 10 38 36 50 101 54 100 56 56 48 97 45 97 50 55 98 45 52 52 55 53 45 56 54 52 49 45 99 51 54 50 98 51 57 57 54 49 102 51 24 128 138 214 231 12 176 160 233 153 6 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[90 19 170 150 10 38 4 116 114 117 101 24 128 138 214 231 12 208 237 197 155 3 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[24 40 96 157 10 38 10 18 8 8 23 16 1 24 0 32 0 24 128 138 214 231 12 208 165 130 224 4 22 1 109] Timestamp:0,0}}]
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3791630463
--- FAIL: TestCachedSettingsServerRestart (230.37s)
Parameters:
TAGS=bazel,gss,race
Same failure on other branches
- #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]
server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ fbcb992a72c3ac2a9af96f6238a24b3978bcaadf:
=== RUN TestCachedSettingsServerRestart
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3168928694
test_log_scope.go:79: use -show-logs to present logs inline
settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[251 2 248 210 10 38 4 116 114 117 101 24 212 168 128 232 12 160 253 155 157 7 22 1 98] Timestamp:0,0}}]
Actual: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[251 2 248 210 10 38 4 116 114 117 101 24 212 168 128 232 12 160 253 155 157 7 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[151 128 229 124 10 38 10 18 8 8 23 16 1 24 0 32 0 24 214 168 128 232 12 240 186 254 165 1 22 1 109] Timestamp:0,0}}]
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3168928694
--- FAIL: TestCachedSettingsServerRestart (230.01s)
Parameters:
TAGS=bazel,gss,race
Same failure on other branches
- #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]
server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 2073003f4a56bcb7eba3e76deae2df14151ed137:
=== RUN TestCachedSettingsServerRestart
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3441775381
test_log_scope.go:79: use -show-logs to present logs inline
settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[111 25 204 142 10 38 4 116 114 117 101 24 246 182 253 233 12 144 146 222 17 22 1 98] Timestamp:0,0}}]
Actual: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[111 25 204 142 10 38 4 116 114 117 101 24 246 182 253 233 12 144 146 222 17 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[81 119 54 33 10 38 10 18 8 8 23 16 1 24 0 32 0 24 246 182 253 233 12 176 137 192 212 1 22 1 109] Timestamp:0,0}}]
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3441775381
--- FAIL: TestCachedSettingsServerRestart (230.14s)
Parameters:
TAGS=bazel,gss,race
Same failure on other branches
- #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]
server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 7748ab2d671c6e8d021af1ca577d2de4578751db:
=== RUN TestCachedSettingsServerRestart
test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3485642125
test_log_scope.go:79: use -show-logs to present logs inline
settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[77 95 233 0 10 38 4 116 114 117 101 24 226 139 145 235 12 128 231 132 128 5 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[205 190 79 226 10 38 10 18 8 8 23 16 1 24 0 32 0 24 226 139 145 235 12 176 197 172 197 6 22 1 109] Timestamp:0,0}}]
Actual: [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[219 97 94 49 10 38 36 100 52 54 49 53 48 97 57 45 56 48 102 56 45 52 52 97 57 45 57 53 102 49 45 97 51 52 50 102 57 53 102 55 52 55 51 24 228 139 145 235 12 208 171 233 73 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[77 95 233 0 10 38 4 116 114 117 101 24 226 139 145 235 12 128 231 132 128 5 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[205 190 79 226 10 38 10 18 8 8 23 16 1 24 0 32 0 24 226 139 145 235 12 176 197 172 197 6 22 1 109] Timestamp:0,0}}]
panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3485642125
--- FAIL: TestCachedSettingsServerRestart (230.47s)
Parameters:
TAGS=bazel,gss,race
Same failure on other branches
- #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-product-security branch-release-23.1.23-rc release-blocker]
Hi @nicktrav, would this test be more of storage or server? Based on the test setup, it looks to be more in the storage domain. Also, it looks like this test has a history of flaking under race as mentioned https://github.com/cockroachdb/cockroach/issues/117813 so probably not a release blocker. The DB Server team is primarily focussing on UA for now so its unlikely we will get to this anytime soon but I can pull it into our backlog if we agree that this should be more of a server thing as opposed to storage
Also, failing on the 23.1 rc branches https://github.com/cockroachdb/cockroach/issues/128977, https://github.com/cockroachdb/cockroach/issues/125429.
This looks more like server. Yes, the test uses Storage APIs, but this is all testing bits and pieces above Storage.
I think the fix to the first issue in this line https://github.com/cockroachdb/cockroach/issues/111742 fixed TestCachedSettingDeletionIsPersisted https://github.com/cockroachdb/cockroach/pull/111758. So for now, maybe all we need to do is apply that fix to TestCachedSettingsServerRestart too.
Oops... my bad, it seems https://github.com/cockroachdb/cockroach/pull/111758 was backported in https://github.com/cockroachdb/cockroach/pull/111785 to 23.1 for TestCachedSettingsServerRestart.
@shubhamdhama and I investigated this together a bit this morning and think that it is a simple race condition in the test.
Namely, the permanent upgrades set 3 cluster settings during startup. Eventually those 3 settings are observed by the settings watcher and persisted to the settings cache. But, the test has no coordination between the settings cache and the shutdown. Thus, it may observe an initial state of only 1 or 2 settings persisted to the cache, and then end up comparing that initial state to the end state of 3 settings persisted to the cache.
A simple solution for now may be to assert the number of settings we expect in the initial state based on our knowledge of what settings get written at startup.