redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Test BadLogLines failures with uncaught raft::offset_monitor::wait_aborted (FranzGoVerifiableWithSiTest.test_si_with_timeboxed, PartitionBalancerTest.test_fuzz_admin_ops)

Open ajfabbri opened this issue 2 years ago • 3 comments

rptest.scale_tests.franz_go_verifiable_test.FranzGoVerifiableWithSiTest.test_si_with_timeboxed.segment_size=10485760
  <BadLogLines nodes=ip-172-31-58-10(3) example="ERROR 2022-06-14 07:28:06,896 
  [shard 0] rpc - Service handler threw an exception: raft::offset_monitor::wait_aborted (offset monitor wait aborted)">

This is similar to #4489; both cases have the offset monitor wait aborted exception.

Reproduced in CDT here.

ajfabbri avatar Jun 17 '22 01:06 ajfabbri

Assigning to @ZeDRoman, as #4489 was something he was looking at.

piyushredpanda avatar Jun 17 '22 01:06 piyushredpanda

This uncaught exception is still in the code. I'm currently seeing it around the same time as I start a bunch of clients doing idempotent writes.

I have a mixture of caught wait_aborted exceptions coming from the id_allocator machinery, and then some uncaught ones making it up to the RPC handler that's logging these as ERROR:

WARN  2022-08-04 19:44:23,752 [shard 0] cluster - id_allocator_frontend.cc:252 - can not create {kafka_internal}/{id_allocator} topic - error: raft::offset_monitor::wait_aborted (offset monitor wait aborted)
WARN  2022-08-04 19:44:23,752 [shard 0] cluster - id_allocator_frontend.cc:70 - can't find {ns: {kafka_internal}, topic: {id_allocator}} in the metadata cache
WARN  2022-08-04 19:44:23,752 [shard 0] kafka - init_producer_id.cc:114 - failed to allocate pid, ec: cluster::errc:14
ERROR 2022-08-04 19:44:23,772 [shard 1] rpc - Service handler threw an exception: raft::offset_monitor::wait_aborted (offset monitor wait aborted)

jcsp avatar Aug 04 '22 20:08 jcsp

FAIL test: PartitionBalancerTest.test_fuzz_admin_ops (2/37 runs) failure at 2022-08-05T07:48:34.288Z: <BadLogLines nodes=docker-rp-8(1) example="ERROR 2022-08-05 06:32:44,034 [shard 0] rpc - Service handler threw an exception: raft::offset_monitor::wait_aborted (offset monitor wait aborted)"> in job https://buildkite.com/redpanda/redpanda/builds/13659#01826c88-355c-4b07-a514-c884579adabb

jcsp avatar Aug 05 '22 13:08 jcsp

Relevant discussion about the wait_aborted exception: https://github.com/redpanda-data/redpanda/pull/6367#discussion_r971280515

ztlpn avatar Sep 15 '22 12:09 ztlpn