redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Semaphore timeout in `FranzGoVerifiableWithSiTest.test_si_with_timeboxed`

Open VladLazar opened this issue 3 years ago • 1 comments

A semaphore timed out in the NTP archiver upload loop and the exception bubbled up stopping the loop:

upload loop error: seastar::semaphore_timed_out (Semaphore timedout)

The latest occurence is from Friday (5 Aug), but there's more in the past: https://buildkite.com/redpanda/vtools/builds/3123#01826bf6-06ed-43d7-b0cf-5124782ac879.

VladLazar avatar Aug 08 '22 13:08 VladLazar

The important clue in the error message is that the semaphore is not named. All the semaphores explicitly created by redpanda are named. This means that the time-out comes from some other concurrency primitive that uses seastar::semaphore under the hood. Some digging later I found the the segment_read_lock in the archiver which matches the description.

The reason for the failure is that the test causes contention on the lock. In the current configuration every segment fetch from SI causes a cache eviction (see https://github.com/redpanda-data/redpanda/pull/5915). That fix might help here too.

The underlying problem of not handling semaphore time outs still remains though. Perhaps we should catch the exception here and silence it as we do for other exceptions in the upload loop.

VladLazar avatar Aug 09 '22 14:08 VladLazar