redpanda
redpanda copied to clipboard
Log startup failure message at ERROR level
Currently startup failures are logged at INFO level, but there doesn't seem to be any reason to not log this critical failure message at error, as it only occurs at most once per run.
Fixes #5058.
These test failures are legit: BadLogLines picks up the error messages for tests where we expect startup to fail, so I'll have to add these to the allowlist.
Failure is https://github.com/redpanda-data/redpanda/issues/5495
both failures seem to be related to this. the second one definitely seems expected and ignoring the bad log line makes sense.
the first one (abort_requested_exception) seems fine too given its printed on shutdown except that maybe it's an opportunity to catch the abort exception earlier in some subsystem.
test_id: rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join
--
| status: FAIL
| run time: 1 minute 3.929 seconds
|
|
| <BadLogLines nodes=docker-rp-24(1) example="ERROR 2022-07-17 19:42:35,506 [shard 0] redpanda::main - application.cc:255 - Failure during startup: seastar::abort_requested_exception (abort requested)">
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
| data = self.run_test()
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
| return self.test_context.function(self.test)
| File "/root/tests/rptest/services/cluster.py", line 48, in wrapped
| self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
| File "/root/tests/rptest/services/redpanda.py", line 1124, in raise_on_bad_logs
| raise BadLogLines(bad_lines)
| rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-24(1) example="ERROR 2022-07-17 19:42:35,506 [shard 0] redpanda::main - application.cc:255 - Failure during startup: seastar::abort_requested_exception (abort requested)">
|
| --------------------------------------------------------------------------------
| test_id: rptest.tests.node_resize_test.NodeResizeTest.test_node_resize
| status: FAIL
| run time: 1 minute 18.763 seconds
|
|
| <BadLogLines nodes=docker-rp-21(2) example="ERROR 2022-07-17 19:48:15,779 [shard 0] redpanda::main - application.cc:255 - Failure during startup: cluster::configuration_invariants_changed (Configuration invariants changed. Expected: { version: 0, node_id: 1, core_count: 2 }, current: { version: 0, node_id: 1, core_count: 1 })">
| Traceback (most recent call last):
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
| data = self.run_test()
| File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
| return self.test_context.function(self.test)
| File "/root/tests/rptest/services/cluster.py", line 48, in wrapped
| self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
| File "/root/tests/rptest/services/redpanda.py", line 1124, in raise_on_bad_logs
| raise BadLogLines(bad_lines)
| rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-21(2) example="ERROR 2022-07-17 19:48:15,779 [shard 0] redpanda::main - application.cc:255 - Failure during startup: cluster::configuration_invariants_changed (Configuration invariants changed. Expected: { version: 0, node_id: 1, core_count: 2 }, current: { version: 0, node_id: 1, core_count: 1 })">
@dotnwat I've updated this with BadLogLines exclusions and it now passes everything.
I've also filed https://github.com/redpanda-data/redpanda/issues/5817 to address the poor error message in the fist case (i.e., "abort requested" is given, masking the true error cause).
lgtm. i restarted ci since this has been here for a couple weeks (my bad!)
Think this commit broke the test in nightly
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-9(1),docker-rp-20(1),docker-rp-10(1) example="ERROR 2022-08-25 00:52:35,355 [shard 0] redpanda::main - application.cc:255 - Failure during startup: std::invalid_argument (Validation errors in node config)">
but weird that the same test with the same error passed in the precommit ... :confused: