cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

server: TestCachedSettingsServerRestart failed

Open cockroach-teamcity opened this issue 1 year ago • 2 comments

server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ d9b0e5f8cefa99bdcc217f6be790d424c603031c:

=== RUN   TestCachedSettingsServerRestart
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart1780436221
    test_log_scope.go:79: use -show-logs to present logs inline
    settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
        Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[186 121 60 169 10 38 4 116 114 117 101 24 178 155 218 228 12 160 211 211 158 2 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[245 212 250 113 10 38 10 18 8 8 23 16 1 24 0 32 0 24 178 155 218 228 12 144 162 253 235 3 22 1 109] Timestamp:0,0}}]
        Actual:   [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[167 136 241 49 10 38 36 100 99 99 49 49 57 51 100 45 49 52 98 52 45 52 49 56 50 45 98 98 55 49 45 57 102 98 56 99 99 100 54 49 54 56 97 24 178 155 218 228 12 144 225 129 180 5 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[186 121 60 169 10 38 4 116 114 117 101 24 178 155 218 228 12 160 211 211 158 2 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[245 212 250 113 10 38 10 18 8 8 23 16 1 24 0 32 0 24 178 155 218 228 12 144 162 253 235 3 22 1 109] Timestamp:0,0}}]
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart1780436221
--- FAIL: TestCachedSettingsServerRestart (230.54s)

Parameters:

  • TAGS=bazel,gss,race
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/server

This test on roachdash | Improve this report!

Jira issue: CRDB-38882

cockroach-teamcity avatar May 20 '24 13:05 cockroach-teamcity

server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 4534017ff77b216dcad4301f13d8ee13cf7fd423:

=== RUN   TestCachedSettingsServerRestart
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3791630463
    test_log_scope.go:79: use -show-logs to present logs inline
    settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
        Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[90 19 170 150 10 38 4 116 114 117 101 24 128 138 214 231 12 208 237 197 155 3 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[24 40 96 157 10 38 10 18 8 8 23 16 1 24 0 32 0 24 128 138 214 231 12 208 165 130 224 4 22 1 109] Timestamp:0,0}}]
        Actual:   [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[28 1 185 15 10 38 36 50 101 54 100 56 56 48 97 45 97 50 55 98 45 52 52 55 53 45 56 54 52 49 45 99 51 54 50 98 51 57 57 54 49 102 51 24 128 138 214 231 12 176 160 233 153 6 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[90 19 170 150 10 38 4 116 114 117 101 24 128 138 214 231 12 208 237 197 155 3 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[24 40 96 157 10 38 10 18 8 8 23 16 1 24 0 32 0 24 128 138 214 231 12 208 165 130 224 4 22 1 109] Timestamp:0,0}}]
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3791630463
--- FAIL: TestCachedSettingsServerRestart (230.37s)

Parameters:

  • TAGS=bazel,gss,race
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jun 25 '24 13:06 cockroach-teamcity

server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ fbcb992a72c3ac2a9af96f6238a24b3978bcaadf:

=== RUN   TestCachedSettingsServerRestart
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3168928694
    test_log_scope.go:79: use -show-logs to present logs inline
    settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
        Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[251 2 248 210 10 38 4 116 114 117 101 24 212 168 128 232 12 160 253 155 157 7 22 1 98] Timestamp:0,0}}]
        Actual:   [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[251 2 248 210 10 38 4 116 114 117 101 24 212 168 128 232 12 160 253 155 157 7 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[151 128 229 124 10 38 10 18 8 8 23 16 1 24 0 32 0 24 214 168 128 232 12 240 186 254 165 1 22 1 109] Timestamp:0,0}}]
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3168928694
--- FAIL: TestCachedSettingsServerRestart (230.01s)

Parameters:

  • TAGS=bazel,gss,race
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jun 29 '24 13:06 cockroach-teamcity

server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 2073003f4a56bcb7eba3e76deae2df14151ed137:

=== RUN   TestCachedSettingsServerRestart
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3441775381
    test_log_scope.go:79: use -show-logs to present logs inline
    settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
        Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[111 25 204 142 10 38 4 116 114 117 101 24 246 182 253 233 12 144 146 222 17 22 1 98] Timestamp:0,0}}]
        Actual:   [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[111 25 204 142 10 38 4 116 114 117 101 24 246 182 253 233 12 144 146 222 17 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[81 119 54 33 10 38 10 18 8 8 23 16 1 24 0 32 0 24 246 182 253 233 12 176 137 192 212 1 22 1 109] Timestamp:0,0}}]
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3441775381
--- FAIL: TestCachedSettingsServerRestart (230.14s)

Parameters:

  • TAGS=bazel,gss,race
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-server-and-security branch-release-23.1.23-rc release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Jul 23 '24 13:07 cockroach-teamcity

server.TestCachedSettingsServerRestart failed with artifacts on release-23.1 @ 7748ab2d671c6e8d021af1ca577d2de4578751db:

=== RUN   TestCachedSettingsServerRestart
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3485642125
    test_log_scope.go:79: use -show-logs to present logs inline
    settings_cache_test.go:141: condition failed to evaluate within 3m45s: initial state settings KVs does not match expected settings
        Expected: [{Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[77 95 233 0 10 38 4 116 114 117 101 24 226 139 145 235 12 128 231 132 128 5 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[205 190 79 226 10 38 10 18 8 8 23 16 1 24 0 32 0 24 226 139 145 235 12 176 197 172 197 6 22 1 109] Timestamp:0,0}}]
        Actual:   [{Key:/Table/6/1/"cluster.secret"/0 Value:{RawBytes:[219 97 94 49 10 38 36 100 52 54 49 53 48 97 57 45 56 48 102 56 45 52 52 97 57 45 57 53 102 49 45 97 51 52 50 102 57 53 102 55 52 55 51 24 228 139 145 235 12 208 171 233 73 22 1 115] Timestamp:0,0}} {Key:/Table/6/1/"diagnostics.reporting.enabled"/0 Value:{RawBytes:[77 95 233 0 10 38 4 116 114 117 101 24 226 139 145 235 12 128 231 132 128 5 22 1 98] Timestamp:0,0}} {Key:/Table/6/1/"version"/0 Value:{RawBytes:[205 190 79 226 10 38 10 18 8 8 23 16 1 24 0 32 0 24 226 139 145 235 12 176 197 172 197 6 22 1 109] Timestamp:0,0}}]
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestCachedSettingsServerRestart3485642125
--- FAIL: TestCachedSettingsServerRestart (230.47s)

Parameters:

  • TAGS=bazel,gss,race
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

  • #125429 server: TestCachedSettingsServerRestart failed [C-test-failure O-robot T-product-security branch-release-23.1.23-rc release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity avatar Aug 06 '24 13:08 cockroach-teamcity

Hi @nicktrav, would this test be more of storage or server? Based on the test setup, it looks to be more in the storage domain. Also, it looks like this test has a history of flaking under race as mentioned https://github.com/cockroachdb/cockroach/issues/117813 so probably not a release blocker. The DB Server team is primarily focussing on UA for now so its unlikely we will get to this anytime soon but I can pull it into our backlog if we agree that this should be more of a server thing as opposed to storage

rimadeodhar avatar Aug 29 '24 21:08 rimadeodhar

Also, failing on the 23.1 rc branches https://github.com/cockroachdb/cockroach/issues/128977, https://github.com/cockroachdb/cockroach/issues/125429.

rimadeodhar avatar Aug 29 '24 21:08 rimadeodhar

This looks more like server. Yes, the test uses Storage APIs, but this is all testing bits and pieces above Storage.

nicktrav avatar Aug 30 '24 16:08 nicktrav

I think the fix to the first issue in this line https://github.com/cockroachdb/cockroach/issues/111742 fixed TestCachedSettingDeletionIsPersisted https://github.com/cockroachdb/cockroach/pull/111758. So for now, maybe all we need to do is apply that fix to TestCachedSettingsServerRestart too.

shubhamdhama avatar Sep 06 '24 16:09 shubhamdhama

Oops... my bad, it seems https://github.com/cockroachdb/cockroach/pull/111758 was backported in https://github.com/cockroachdb/cockroach/pull/111785 to 23.1 for TestCachedSettingsServerRestart.

shubhamdhama avatar Sep 06 '24 18:09 shubhamdhama

@shubhamdhama and I investigated this together a bit this morning and think that it is a simple race condition in the test.

Namely, the permanent upgrades set 3 cluster settings during startup. Eventually those 3 settings are observed by the settings watcher and persisted to the settings cache. But, the test has no coordination between the settings cache and the shutdown. Thus, it may observe an initial state of only 1 or 2 settings persisted to the cache, and then end up comparing that initial state to the end state of 3 settings persisted to the cache.

A simple solution for now may be to assert the number of settings we expect in the initial state based on our knowledge of what settings get written at startup.

stevendanna avatar Sep 09 '24 10:09 stevendanna