kafka icon indicating copy to clipboard operation
kafka copied to clipboard

KAFKA-17645: KIP-1052: Enable warmup in producer performance test

Open matt-welch opened this issue 1 year ago • 2 comments

In order to better analyze steady-state performance of Kafka, this PR enables a warmup in the Producer Performance test. The warmup duration is specified as a number of records that are a subset of the total numRecords. If warmup records is greater than 0, the warmup is represented by a second Stats object which holds warmup results. Once warmup records have been exhausted, the test switches to using the existing Stats object. At end of test, if warmup was enabled, the summary of the whole test (warump + steady state) is printed followed by the summary of the steady-state portion of the test. If no warmup is used, summary prints don't change from existing behavior. This contribution is an original work and is licensed to the Kafka project under the Apache license

Testing strategy comprises new Java unit tests added to ProducerPerformanceTests.java.

Committer Checklist (excluded from commit message)

  • [X] Verify design and implementation
  • [X] Verify test coverage and CI build status
  • [ ] Verify documentation (including upgrade notes)

matt-welch avatar Oct 01 '24 17:10 matt-welch

I was able to reproduce and fix the performance issue. I had failed to create the record callback for messages in steady state :( It should be fixed in the latest push, but I still think the logic in that conditional can be improved.

re: checking warmup records before num records, I've added an exception for when numRecords <= 0.

FYI, I will be out for the next week, but thanks so much for looking at this patch!!

matt-welch avatar Oct 03 '24 23:10 matt-welch

Thanks for the latest changes. Results look much better now:

2000000 records sent, 9998.350272 records/sec (0.00 MB/sec), 0.58 ms avg latency, 417.00 ms max latency, 417 ms 50th, 417 ms 95th, 417 ms 99th, 417 ms 99.9th.
1500000 steady state records sent, 9997.733847 records/sec (0.95 MB/sec), 0.52 ms avg latency, 9.00 ms max latency, 1 ms 50th, 1 ms 95th, 1 ms 99th, 2 ms 99.9th.

There is still an issue with the overall throughput (0.00 MB/sec) though. I also see some checkstyle errors after rebasing.

FYI, I will be out for the next week, but thanks so much for looking at this patch!

No rush, take your time.

fvaleri avatar Oct 04 '24 07:10 fvaleri

Hi @matt-welch, there is one checkstyle error and you also probably need to rebase.

fvaleri avatar Oct 21 '24 14:10 fvaleri

@matt-welch sorry for the delay. Latest changes LGTM. Please, address the comments from @chia7712 and then I think we are good. Thanks.

PS: Feel free to ping people for review when you do changes, as we don't get notifications for new commits.

fvaleri avatar Nov 04 '24 08:11 fvaleri

Hi @chia7712 @fvaleri . My latest commit refactors the use of a warmupStats object and a main stats object in favor of having one stats that covers the whole test and a second object, steadyStateStats, that will be used during steady state operation.
In the case of a steady state record, I've added a new callback that will write into both data objects during steady state. This new approach seems like a simpler, cleaner design and is much easier to understand what is happening. I've done some testing on both designs and they appear to perform similarly. My only reservation is that the producer will now have a larger memory requirement due to the redundant recording of latency data into the two objects. Let me know what you think.

matt-welch avatar Dec 11 '24 23:12 matt-welch

@matt-welch thanks for your update. I will take a look later!

chia7712 avatar Dec 12 '24 23:12 chia7712

Hi @chia7712 and @fvaleri. I've made some updates to this PR in my latest commits including some sample output for review. Please let me know if there's anything else I can or should do to complete the PR or answer any concerns about it. Thanks!

matt-welch avatar Feb 10 '25 19:02 matt-welch

@matt-welch sorry for late review. could you please sync the trunk?

chia7712 avatar Feb 26 '25 10:02 chia7712

My latest commit moves DEFAULT_REPORTING_INTERVAL_MS into Stats and removes the DEBUG print about config.warmupRecords. Please let me know if there are any additional changes needed on this patch. Thanks @chia7712 @kirktrue !

matt-welch avatar Apr 18 '25 21:04 matt-welch

@chia7712 @kirktrue I think I've addressed the issues raised, but does anyone have any more input on this patch?

matt-welch avatar May 08 '25 15:05 matt-welch

Hi @chia7712 @kirktrue .
Does anyone have additional feedback with this PR? There doesn't appear to be any updates needed to docs/ since that archive of documentation doesn't mention the producer perf test as far as I could tell. Are there any other items that need updating? Thanks!

matt-welch avatar May 22 '25 23:05 matt-welch

Hi @chia7712. I've updated my latest commit with your suggestions and I think it's a bit more readable now.
Let me know if anyone has any more feedback.

matt-welch avatar May 30 '25 22:05 matt-welch

Hi again @chia7712 :) Just checking in to see if there are any more issues with this patch. Please let me know if there's anything I can do to improve it.

matt-welch avatar Jun 27 '25 22:06 matt-welch

@matt-welch thanks for your contribution and patience.

chia7712 avatar Jul 24 '25 05:07 chia7712

Thank you @fvaleri @kirktrue @chia7712 for all your help and support getting this approved!

matt-welch avatar Jul 24 '25 15:07 matt-welch