redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Log startup failure message at ERROR level

Open travisdowns opened this issue 3 years ago • 5 comments

Currently startup failures are logged at INFO level, but there doesn't seem to be any reason to not log this critical failure message at error, as it only occurs at most once per run.

Fixes #5058.

travisdowns avatar Jun 07 '22 21:06 travisdowns

These test failures are legit: BadLogLines picks up the error messages for tests where we expect startup to fail, so I'll have to add these to the allowlist.

travisdowns avatar Jun 08 '22 06:06 travisdowns

Failure is https://github.com/redpanda-data/redpanda/issues/5495

dotnwat avatar Jul 17 '22 19:07 dotnwat

both failures seem to be related to this. the second one definitely seems expected and ignoring the bad log line makes sense.

the first one (abort_requested_exception) seems fine too given its printed on shutdown except that maybe it's an opportunity to catch the abort exception earlier in some subsystem.

test_id:    rptest.tests.cluster_features_test.FeaturesNodeJoinTest.test_old_node_join
--
  | status:     FAIL
  | run time:   1 minute 3.929 seconds
  |  
  |  
  | <BadLogLines nodes=docker-rp-24(1) example="ERROR 2022-07-17 19:42:35,506 [shard 0] redpanda::main - application.cc:255 - Failure during startup: seastar::abort_requested_exception (abort requested)">
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
  | data = self.run_test()
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
  | return self.test_context.function(self.test)
  | File "/root/tests/rptest/services/cluster.py", line 48, in wrapped
  | self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  | File "/root/tests/rptest/services/redpanda.py", line 1124, in raise_on_bad_logs
  | raise BadLogLines(bad_lines)
  | rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-24(1) example="ERROR 2022-07-17 19:42:35,506 [shard 0] redpanda::main - application.cc:255 - Failure during startup: seastar::abort_requested_exception (abort requested)">
  |  
  | --------------------------------------------------------------------------------
  | test_id:    rptest.tests.node_resize_test.NodeResizeTest.test_node_resize
  | status:     FAIL
  | run time:   1 minute 18.763 seconds
  |  
  |  
  | <BadLogLines nodes=docker-rp-21(2) example="ERROR 2022-07-17 19:48:15,779 [shard 0] redpanda::main - application.cc:255 - Failure during startup: cluster::configuration_invariants_changed (Configuration invariants changed. Expected: { version: 0, node_id: 1, core_count: 2 }, current: { version: 0, node_id: 1, core_count: 1 })">
  | Traceback (most recent call last):
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 135, in run
  | data = self.run_test()
  | File "/usr/local/lib/python3.9/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
  | return self.test_context.function(self.test)
  | File "/root/tests/rptest/services/cluster.py", line 48, in wrapped
  | self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  | File "/root/tests/rptest/services/redpanda.py", line 1124, in raise_on_bad_logs
  | raise BadLogLines(bad_lines)
  | rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-21(2) example="ERROR 2022-07-17 19:48:15,779 [shard 0] redpanda::main - application.cc:255 - Failure during startup: cluster::configuration_invariants_changed (Configuration invariants changed. Expected: { version: 0, node_id: 1, core_count: 2 }, current: { version: 0, node_id: 1, core_count: 1 })">

dotnwat avatar Jul 17 '22 21:07 dotnwat

@dotnwat I've updated this with BadLogLines exclusions and it now passes everything.

travisdowns avatar Aug 03 '22 17:08 travisdowns

I've also filed https://github.com/redpanda-data/redpanda/issues/5817 to address the poor error message in the fist case (i.e., "abort requested" is given, masking the true error cause).

travisdowns avatar Aug 03 '22 17:08 travisdowns

lgtm. i restarted ci since this has been here for a couple weeks (my bad!)

dotnwat avatar Aug 18 '22 23:08 dotnwat

Think this commit broke the test in nightly

rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-9(1),docker-rp-20(1),docker-rp-10(1) example="ERROR 2022-08-25 00:52:35,355 [shard 0] redpanda::main - application.cc:255 - Failure during startup: std::invalid_argument (Validation errors in node config)">

but weird that the same test with the same error passed in the precommit ... :confused:

bharathv avatar Aug 25 '22 03:08 bharathv