druid icon indicating copy to clipboard operation
druid copied to clipboard

Fix flaky tests in SeekableStreamSupervisorStateTest

Open abhishekagarwal87 opened this issue 3 years ago • 2 comments

This PR attempts to fix a few flaky tests and add more information to the assertion failure message

  • SeekableStreamSupervisorStateTest - Tests in this class rely on a latch to know when a metric has been emitted. However, the latch is counted down for all the emitted metrics and not just the metric we want to test the emission of. This is why tests wait for the latch to be counted down twice. Except in one place, probably by accident, and that was a flaky test. Now with this change, we wait for latch to be released only for the metric being tested. I am also using the same latch mechanism to track emission of notice time metric (that was another flaky test)
  • ITTLS test - We now retry requests for ClosedChannelException errors as well. This may fix the flakiness.
  • Added more information to the assertion failure message in IndexTaskTest and CompactionTaskParallelRunTest.
  • AbstractQueryResourceTestClient - Added retries for running queries.

This PR has:

  • [ ] been self-reviewed.

abhishekagarwal87 avatar Aug 08 '22 13:08 abhishekagarwal87

@abhishekagarwal87, thanks for the fix! Really excited to see another source of error removed.

The changes look straight-forward enough. However, it is not clear to this newbie what they're trying to fix. Something about a latch? Can you add a note to explain the underlying problem a bit more?

paul-rogers avatar Aug 08 '22 16:08 paul-rogers

@paul-rogers - I have added more explanation now in the PR. can you review again?

abhishekagarwal87 avatar Aug 12 '22 05:08 abhishekagarwal87