kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-17455: fix stuck producer when throttling or retrying

Open coltmcnealy-lh opened this issue 1 year ago • 2 comments

fixes stuck producer by polling again after pollDelayMs in NetworkUtils#awaitReady()

coltmcnealy-lh avatar Oct 17 '24 15:10 coltmcnealy-lh

We are soaking this change right now. If it doesn't cause other problems, I will need help with unit tests. Addresses this ticket: https://issues.apache.org/jira/browse/KAFKA-17455

coltmcnealy-lh avatar Oct 17 '24 15:10 coltmcnealy-lh

@dajac any help would be appreciated with the tests. Not sure if what I attempted to do was clear or not, but (obviously) the changes I made to MockClient.java broke tons of stuff.

The reason for those changes was that since poll() didn't advance the time, it got stuck in an infinite loop before these changes. There must be a better way to test this.

coltmcnealy-lh avatar Oct 17 '24 23:10 coltmcnealy-lh

@dajac writes:

I suppose that the test passes with new MockTime(1L) because the new time is not used by the client. The client must be created after the time is set. Would it work? I would also prefer this over adding the sleep in the mock client.

It does pass; however, it also passes on trunk, which means the test has no signal. I am fine with making that change however @mjsax told me it was best to craft a test that fails before the patch (admittedly, this was tricky).

coltmcnealy-lh avatar Jan 09 '25 02:01 coltmcnealy-lh

@dajac writes:

I suppose that the test passes with new MockTime(1L) because the new time is not used by the client. The client must be created after the time is set. Would it work? I would also prefer this over adding the sleep in the mock client.

It does pass; however, it also passes on trunk, which means the test has no signal. I am fine with making that change however @mjsax told me it was best to craft a test that fails before the patch (admittedly, this was tricky).

@coltmcnealy-lh Thanks for trying. I also gave it a shot and I could not find a better way to do it. I think that we can keep your unit test as you proposed it. I left a minor comment.

dajac avatar Jan 09 '25 08:01 dajac

Thanks for the fix. Merged to trunk and cherry-picked to 4.0, 3.9, and 3.8 branches.

mjsax avatar Jan 09 '25 19:01 mjsax