pulsar
pulsar copied to clipboard
Replace the ConcurrentOpenHashMap as much as possible
Search before asking
- [X] I searched in the issues and found nothing similar.
Motivation
There was a discussion at Sep. 2023 before to replace Customized Map with ConcurrentOpenHashMap. In this issue, I'd focus on the ConcurrentOpenHashMap.
Here is the only advantage of this map.
Provides similar methods as a {@code ConcurrentMap<K,V>} but since it's an open hash map with linear probing, no node allocations are required to store the values.
Let me count the disadvantages.
1. Bad performance
This map was aded in the initial commit (on 2016). However, this implementation was just based on the Java 7 implementation of the ConcurrentHashMap, which uses a segment based lock. Actually, this solution was discarded by the Java team since Java 8.
https://github.com/apache/pulsar/pull/20647/#issuecomment-1607257960 did a benchmark and found the performance was much worse than the current ConcurrentHashMap provided by Java library. We can also search the PROs of the Java 8 design in network, or just ask for ChatGPT.
Besides, the frequently used keys() and values() methods just copy the keys and values to a new list. While the ConcurrentHashMap just returns a thread-safe internal view that users can choose whether to make a copy.
Anyway, to prove the performance is worse than ConcurrentHashMap, we need to have more tests and research. So it's the least important reason.
2. Lack of the updates
This class was rarely updated. What I can remember is the shrink support two years ago. https://github.com/apache/pulsar/pull/14663
From https://github.com/apache/bookkeeper/pull/3061, we can see the motivation is the frequently appeared Full GC caused by this implementation. However, adding a shrink method makes it harder to use. There are already many parameters to tune, see it's builder:
public static class Builder<K, V> {
int expectedItems = DefaultExpectedItems;
int concurrencyLevel = DefaultConcurrencyLevel;
float mapFillFactor = DefaultMapFillFactor;
float mapIdleFactor = DefaultMapIdleFactor;
float expandFactor = DefaultExpandFactor;
float shrinkFactor = DefaultShrinkFactor;
boolean autoShrink = DefaultAutoShrink;
Many xxxFactors and the concurrency level. It's hard to determine a proper value by default. However, it makes new developers hard to modify it.
3. Bad debug experience
When I debugged the topics maintained in a BrokerService.
As you can see. There are 16 sections. And I have to iterate over all these sections and expand the table array to find the target topic.
Let's compare it with the official ConcurrentHashMap (I replaced it locally)
Besides, it's even harder to analyze in the heap dump.
4. Not friendly to new developers
Many places just use it as a concurrent hash map. What's the reason for new developers to not use the official ConcurrentHashMap, which is developed and consistently improved by a professional team? Just to reduce the node allocation? With the improving JVM GC?
As I've mentioned, this class might be introduced at the Java 7 era. Now the minimum required Java version of broker side is 17. We have ZGC. We have Shenandoah GC. We have many more JVM developers developing better GC. I'm suspecting if the advantage makes sense.
I cannot think of a reason to choose this hard-to-maintain class rather than well-maintained official ConcurrentHashMap.
For example, when I maintained KoP, I encountered the deadlock of ConcurrentLongHashMap (maybe the similar implementation). https://github.com/streamnative/kop/pull/620 And it's hard to know if this case is fixed. So I have to switch to the official ConcurrentHashMap.
Solution
Replace ConcurrentOpenHashMap with the official Java ConcurrentHashMap.
Alternatives
N/A
Anything else?
Java Concurrent HashMap Improvements over the years https://medium.com/@vikas.taank_40391/java-concurrent-hashmap-improvements-over-the-years-8d8b7be6ce37
Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
PR list
- [ ] https://github.com/apache/pulsar/pull/23217
https://github.com/apache/pulsar/pull/12729 shows the ConcurrentOpenHashMap allows null values. It adds a removeNullValue public method to handle the null values. This is very error-prone.
Here is an example:
final var map = new ConcurrentHashMap<String, String>();
map.computeIfAbsent("A", __ -> null);
System.out.println(map.size());
final var map2 = new ConcurrentOpenHashMap<String, String>();
map2.computeIfAbsent("A", __ -> null);
System.out.println(map2.size());
Outputs:
0
1
The caller of ConcurrentOpenHashMap#computeIfAbsent needs to manually check the return value and call removeNullValue to handle the case a null value is inserted. What a terrible API design :(
I added some benchmarks with different threads (by modifying the BenchmarkState.numThreads field)
TL; DR
There is no reason to use ConcurrentOpenHashMap for the sake of performance.
computeIfAbsent
https://gist.github.com/BewareMyPower/1083937e30cb0f5a63be74c4ee9c5559
- Generate 10000 keys in range [0, 100) in advance
- Call
computeIfAbsentin N threads on all keys
2 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapTest.testConcurrentOpenHashMap thrpt 5 2875.969 ± 189.894 ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap thrpt 5 5252.836 ± 1312.977 ops/s
ConcurrentHashMapTest.testSynchronizedHashMap thrpt 5 996.854 ± 21.998 ops/s
ConcurrentHashMap is 1.8x faster than the ConcurrentOpenHashMap.
4 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapTest.testConcurrentOpenHashMap thrpt 5 1460.855 ± 152.361 ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap thrpt 5 3750.708 ± 1324.286 ops/s
ConcurrentHashMapTest.testSynchronizedHashMap thrpt 5 270.024 ± 11.736 ops/s
ConcurrentHashMap is 2.5x faster than the ConcurrentOpenHashMap.
8 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapTest.testConcurrentOpenHashMap thrpt 5 648.256 ± 278.038 ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap thrpt 5 2383.684 ± 663.103 ops/s
ConcurrentHashMapTest.testSynchronizedHashMap thrpt 5 130.503 ± 4.253 ops/s
ConcurrentHashMap is 3.6x faster than ConcurrentOpenHashMap.
16 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapTest.testConcurrentOpenHashMap thrpt 5 259.008 ± 204.781 ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap thrpt 5 1443.469 ± 368.230 ops/s
ConcurrentHashMapTest.testSynchronizedHashMap thrpt 5 59.875 ± 2.984 ops/s
ConcurrentHashMap is 5.5x faster than ConcurrentOpenHashMap
Conclusion
ConcurrentHashMap performs definitely much better. With the number of threads increasing, the gap becomes much larger.
get
https://gist.github.com/BewareMyPower/b6254ec64932c3cf86b6ec7433a631da
- Generate 5000 random keys (might be duplicated) in range [0, 10000) and insert them
- Call
geton key from 0 to 10000 (inclusive) in N threads
2 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap thrpt 5 3741.910 ± 206.337 ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap thrpt 5 5807.285 ± 1896.396 ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap thrpt 5 870.238 ± 66.370 ops/s
4 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap thrpt 5 3485.099 ± 269.270 ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap thrpt 5 3777.292 ± 1120.207 ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap thrpt 5 256.690 ± 16.778 ops/s
8 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap thrpt 5 2318.738 ± 135.056 ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap thrpt 5 2568.236 ± 828.029 ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap thrpt 5 119.749 ± 2.051 ops/s
16 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap thrpt 5 1711.945 ± 128.223 ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap thrpt 5 1459.255 ± 431.960 ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap thrpt 5 59.398 ± 2.085 ops/s
32 threads
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap thrpt 5 863.625 ± 36.957 ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap thrpt 5 1001.633 ± 222.766 ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap thrpt 5 28.798 ± 4.943 ops/s
Conclusion
When the number of threads is small, ConcurrentHashMap has better performance though they are similar. It should be noted that when the number of threads is 16, ConcurrentOpenHashMap performs better. It might also because the default concurrency level is 16 for ConcurrentOpenHashMap. With 32 threads, ConcurrentHashMap performs better again.
I support this initiative. It is a little bit scary to change this everywhere, maybe we can have a few smaller PRs ?
Okay, let me split https://github.com/apache/pulsar/pull/23216 into multiple PRs later.
https://github.com/apache/pulsar/pull/23217 is the first PR to replace the ConcurrentOpenHashMap in load manager. However, it's a bug fix rather than an improvement because the current use of the concurrent hash map is wrong.
It might be a big changes. Is it possible to remain ConcurrentOpenHashMap interface, and use ConcurrentHashMap as ConcurrentOpenHashMap interface realization? So that we may only need to modify ConcurrentOpenHashMap file.
It might be a big changes. Is it possible to remain ConcurrentOpenHashMap interface, and use ConcurrentHashMap as ConcurrentOpenHashMap interface realization? So that we may only need to modify ConcurrentOpenHashMap file.
It will produce much more garbage code in Pulsar.
The idea to reduce code changes seems good, but the ConcurrentOpenHashMap is not fully compatible with ConcurrentHashMap. Let me count one by one.
- Constructor changes.
ConcurrentOpenHashMap.<TopicName,
PersistentOfflineTopicStats>newBuilder()./* ... */build();
You still need to keep the builder with all meaningless parameters like concurrencyLevel, which is confusing.
- Non-standard
keys()andvalues()APIs
public List<K> keys() {
List<K> keys = new ArrayList<>((int) size());
forEach((key, value) -> keys.add(key));
return keys;
}
public List<V> values() {
List<V> values = new ArrayList<>((int) size());
forEach((key, value) -> values.add(value));
return values;
}
ConcurrentHashMap does not provide these methods because of the overhead of copying keys or values into a new collection.
- The
removeNullValuemethod, which is really wrong and should not exist.
So just let me split the huge PR into multiple relatively small PRs.
It will produce much more garbage code in Pulsar.
The idea to reduce code changes seems good, but the
ConcurrentOpenHashMapis not fully compatible withConcurrentHashMap. Let me count one by one.
- Constructor changes.
ConcurrentOpenHashMap.<TopicName, PersistentOfflineTopicStats>newBuilder()./* ... */build();You still need to keep the builder with all meaningless parameters like
concurrencyLevel, which is confusing.
- Non-standard
keys()andvalues()APIspublic List<K> keys() { List<K> keys = new ArrayList<>((int) size()); forEach((key, value) -> keys.add(key)); return keys; } public List<V> values() { List<V> values = new ArrayList<>((int) size()); forEach((key, value) -> values.add(value)); return values; }
ConcurrentHashMapdoes not provide these methods because of the overhead of copying keys or values into a new collection.
- The
removeNullValuemethod, which is really wrong and should not exist.So just let me split the huge PR into multiple relatively small PRs.
Ok. It sounds we would better replace the whole ConcurrentHashMap.
Now all PRs are merged so that Pulsar 4.0.0 won't have any ConcurrentOpenHashMap. Close this issue.