[Improvement] Optimize CharRange hashCode: Benchmark Objects.hash vs bitwise operations
Hello @garydgregory I have submitted a new PR that addresses this
- Full unit tests are included, and a comparison of two hashCode algorithms (Objects.hash vs. bitwise operations) is added with both contract validation and performance benchmarks.
I’ve conducted a detailed performa hashCode implementations for the CharRange class (package: org.apache.commons.lang3) and wanted to share the results: 1. Test Overview I benchmarked two hashCode implementations for CharRange (coreorg.apache.commons.lang3.CharRange class): Baseline: hashCodeObjects() (using Objects.hash(end, negated, start) – standard general-purpose implementat Optimized: hashCodeBitwise() (bitwise splicing of startendnegated 2. Key Test Results 2.1 Performance Benchmark (100 million iterations/scenario)
CharRangeTest#testHashCode_ContractCompliance
CharRangeTest#testHashCode_PerformanceComparison
===== CharRange hashCode Efficiency Comparison (Execution count: 100000000 iterations/scenario) =====
Total time for Objects.hash version: 5139 ms, accumulated result: 6360454000000000
Total time for bitwise operation version: 94 ms, accumulated result: 1087927400000000
Efficiency improvement of bitwise version over Objects.hash version: 98.17%
The bitwise implementation achieves a 98.17% reduction in execution time compared to the Objects.hash
2.2 Hash Collision Rate Test (1 million unique CharRange instances)
CharRangeTest#testHashCodeCollisionRate
To verify the hash distribution quality (a critical factor for hash table performance), I conducted a collision rate test with 1 million unique CharRange instances (covering normal/negated ranges, single-character ranges, and extreme value ranges). The results are as follows:
Generated 1000000 unique CharRange instances
===== HashCode Collision Rate Comparison =====
Objects.hash implementation:
Number of unique hash codes: 989719
Total colliding instances: 10281
Collision rate: 1.028100%
Bitwise implementation:
Number of unique hash codes: 999885
Total colliding instances: 115
Collision rate: 0.011500%
The bitwise implementation reduces the hash collision rate by approximately 98.88% (from 1.0281% to 0.0115%) compared to the Objects.hash version, representing a near 99% reduction in colliding instances and drastically improving hash distribution quality.
3. Why the Bitwise Version is Superior
- Extreme efficiency: Uses only 3 native bitwise operations (left shift, bitwise OR, XOR) – no temporary array allocation, autoboxing, loops, or multiplication (all sources of overhead in Objects.hash)
- Contract compliance: Strictly adheres to Java’s hashCode/equals rules (equal instances have identical hashes; unequal instances rarely collide).
- Memory-friendly: No GC overhead (unlike Objects.hash, which creates temporary Object[] arrays).
@IcoreE Why is there a new PR instead of keeping #1524? Now we have to track 2 discussions?
@IcoreE Why is there a new PR instead of keeping #1524? Now we have to track 2 discussions?
@garydgregory I’m so sorry for the confusion caused by the new PR – I’m a beginner with open-source PR workflows and made a misstep here. In #1524, I initially added the hashCode tests in a new CharRangeHashCodeTest.java file, but later realized it’s better to supplement these tests directly in CharRangeTest.java (to keep all CharRange tests in one place), and I have since closed PR #1524. That’s why I created the new PR, which was an incorrect approach. Apologies again for the inconvenience!