kafka
kafka copied to clipboard
KAFKA-17730: Fix ReplicaFetcherThreadBenchmark
NotLeaderOrFollowerException occurs here https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/RemoteLeaderEndPoint.scala#L188
The current fix is to catch and ignore NotLeaderOrFollowerException.
Local benchmark result:
./jmh-benchmarks/jmh.sh ReplicaFetcherThreadBenchmark
running gradlew :jmh-benchmarks:clean :jmh-benchmarks:shadowJar
> Configure project :
Starting build with version 4.0.0-SNAPSHOT (commit id 94c7ede7) using Gradle 8.10, Java 17 and Scala 2.13.15
Build properties: ignoreFailures=false, maxParallelForks=6, maxScalacThreads=6, maxTestRetries=0
Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
For more on this, please refer to https://docs.gradle.org/8.10/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.
BUILD SUCCESSFUL in 22s
96 actionable tasks: 23 executed, 73 up-to-date
gradle build done
running JMH with args: ReplicaFetcherThreadBenchmark
# JMH version: 1.37
# VM version: JDK 17.0.12, OpenJDK 64-Bit Server VM, 17.0.12+7-Ubuntu-1ubuntu222.04
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 15 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher
# Parameters: (partitionCount = 100)
# Run progress: 0,00% complete, ETA 00:13:20
# Fork: 1 of 1
# Warmup Iteration 1: [2024-10-10 13:36:57,811] WARN The new 'consumer' rebalance protocol is only supported in KRaft cluster with the new group coordinator. (kafka.server.KafkaConfig:70)
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
1929,906 ns/op
# Warmup Iteration 2: 1860,040 ns/op
# Warmup Iteration 3: 1879,765 ns/op
# Warmup Iteration 4: 1884,042 ns/op
# Warmup Iteration 5: 1875,712 ns/op
Iteration 1: 1877,666 ns/op
Iteration 2: 1885,357 ns/op
Iteration 3: 1876,356 ns/op
Iteration 4: 1874,775 ns/op
Iteration 5: 1875,129 ns/op
Iteration 6: 1872,721 ns/op
Iteration 7: 1876,337 ns/op
Iteration 8: 1890,266 ns/op
Iteration 9: 1870,369 ns/op
Iteration 10: 1885,525 ns/op
Iteration 11: 1989,414 ns/op
Iteration 12: 1912,892 ns/op
Iteration 13: 1922,298 ns/op
Iteration 14: 1902,687 ns/op
Iteration 15: 1906,352 ns/op
Result "org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher":
1894,543 ±(99.9%) 32,917 ns/op [Average]
(min, avg, max) = (1870,369, 1894,543, 1989,414), stdev = 30,790
CI (99.9%): [1861,626, 1927,460] (assumes normal distribution)
# JMH version: 1.37
# VM version: JDK 17.0.12, OpenJDK 64-Bit Server VM, 17.0.12+7-Ubuntu-1ubuntu222.04
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 15 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher
# Parameters: (partitionCount = 500)
# Run progress: 25,00% complete, ETA 00:10:12
# Fork: 1 of 1
# Warmup Iteration 1: [2024-10-10 13:40:22,069] WARN The new 'consumer' rebalance protocol is only supported in KRaft cluster with the new group coordinator. (kafka.server.KafkaConfig:70)
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
8464,782 ns/op
# Warmup Iteration 2: 8192,703 ns/op
# Warmup Iteration 3: 8162,707 ns/op
# Warmup Iteration 4: 8122,797 ns/op
# Warmup Iteration 5: 8169,713 ns/op
Iteration 1: 8057,133 ns/op
Iteration 2: 8053,061 ns/op
Iteration 3: 8077,125 ns/op
Iteration 4: 8039,068 ns/op
Iteration 5: 8024,524 ns/op
Iteration 6: 8035,134 ns/op
Iteration 7: 8013,353 ns/op
Iteration 8: 8018,225 ns/op
Iteration 9: 8021,750 ns/op
Iteration 10: 8053,567 ns/op
Iteration 11: 8047,978 ns/op
Iteration 12: 8515,976 ns/op
Iteration 13: 8523,523 ns/op
Iteration 14: 8521,076 ns/op
Iteration 15: 8524,231 ns/op
Result "org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher":
8168,382 ±(99.9%) 236,112 ns/op [Average]
(min, avg, max) = (8013,353, 8168,382, 8524,231), stdev = 220,860
CI (99.9%): [7932,269, 8404,494] (assumes normal distribution)
# JMH version: 1.37
# VM version: JDK 17.0.12, OpenJDK 64-Bit Server VM, 17.0.12+7-Ubuntu-1ubuntu222.04
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 15 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher
# Parameters: (partitionCount = 1000)
# Run progress: 50,00% complete, ETA 00:06:56
# Fork: 1 of 1
# Warmup Iteration 1: [2024-10-10 13:43:53,904] WARN The new 'consumer' rebalance protocol is only supported in KRaft cluster with the new group coordinator. (kafka.server.KafkaConfig:70)
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
16887,223 ns/op
# Warmup Iteration 2: 16481,102 ns/op
# Warmup Iteration 3: 16141,360 ns/op
# Warmup Iteration 4: 16114,730 ns/op
# Warmup Iteration 5: 16072,493 ns/op
Iteration 1: 15944,404 ns/op
Iteration 2: 16098,280 ns/op
Iteration 3: 15944,495 ns/op
Iteration 4: 16056,134 ns/op
Iteration 5: 15999,214 ns/op
Iteration 6: 16086,102 ns/op
Iteration 7: 16064,142 ns/op
Iteration 8: 16058,817 ns/op
Iteration 9: 16059,667 ns/op
Iteration 10: 16082,960 ns/op
Iteration 11: 16037,771 ns/op
Iteration 12: 15971,635 ns/op
Iteration 13: 15983,740 ns/op
Iteration 14: 15946,546 ns/op
Iteration 15: 16033,504 ns/op
Result "org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher":
16024,494 ±(99.9%) 58,477 ns/op [Average]
(min, avg, max) = (15944,404, 16024,494, 16098,280), stdev = 54,699
CI (99.9%): [15966,017, 16082,971] (assumes normal distribution)
# JMH version: 1.37
# VM version: JDK 17.0.12, OpenJDK 64-Bit Server VM, 17.0.12+7-Ubuntu-1ubuntu222.04
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# VM options: <none>
# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
# Warmup: 5 iterations, 10 s each
# Measurement: 15 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher
# Parameters: (partitionCount = 5000)
# Run progress: 75,00% complete, ETA 00:03:32
# Fork: 1 of 1
# Warmup Iteration 1: [2024-10-10 13:47:35,385] WARN The new 'consumer' rebalance protocol is only supported in KRaft cluster with the new group coordinator. (kafka.server.KafkaConfig:70)
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
90793,927 ns/op
# Warmup Iteration 2: 87362,350 ns/op
# Warmup Iteration 3: 86543,760 ns/op
# Warmup Iteration 4: 86226,549 ns/op
# Warmup Iteration 5: 87073,001 ns/op
Iteration 1: 87169,335 ns/op
Iteration 2: 87936,322 ns/op
Iteration 3: 87299,357 ns/op
Iteration 4: 88002,498 ns/op
Iteration 5: 86985,872 ns/op
Iteration 6: 87270,992 ns/op
Iteration 7: 86482,991 ns/op
Iteration 8: 86770,086 ns/op
Iteration 9: 85933,541 ns/op
Iteration 10: 85948,712 ns/op
Iteration 11: 87414,002 ns/op
Iteration 12: 87037,239 ns/op
Iteration 13: 87365,601 ns/op
Iteration 14: 87367,632 ns/op
Iteration 15: 87196,675 ns/op
Result "org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.testFetcher":
87078,724 ±(99.9%) 640,395 ns/op [Average]
(min, avg, max) = (85933,541, 87078,724, 88002,498), stdev = 599,026
CI (99.9%): [86438,329, 87719,119] (assumes normal distribution)
# Run complete. Total time: 00:16:42
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
NOTE: Current JVM experimentally supports Compiler Blackholes, and they are in use. Please exercise
extra caution when trusting the results, look into the generated code to check the benchmark still
works, and factor in a small probability of new VM bugs. Additionally, while comparisons between
different JVMs are already problematic, the performance difference caused by different Blackhole
modes can be very significant. Please make sure you use the consistent Blackhole mode for comparisons.
Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 1894,543 ± 32,917 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 8168,382 ± 236,112 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 16024,494 ± 58,477 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 87078,724 ± 640,395 ns/op
JMH benchmarks done
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
@mimaison @chia7712 Could you please tell me if ignoring the exception is correct or if we need to look into the cause of the exception?
@mimaison @chia7712 please give me feedback.