commons-lang icon indicating copy to clipboard operation
commons-lang copied to clipboard

[Improvement] Optimize CharRange hashCode: Benchmark Objects.hash vs bitwise operations

Open IcoreE opened this issue 1 month ago • 2 comments

Hello @garydgregory I have submitted a new PR that addresses this

  • Full unit tests are included, and a comparison of two hashCode algorithms (Objects.hash vs. bitwise operations) is added with both contract validation and performance benchmarks.

I’ve conducted a detailed performa hashCode implementations for the CharRange class (package: org.apache.commons.lang3) and wanted to share the results: 1. Test Overview I benchmarked two hashCode implementations for CharRange (coreorg.apache.commons.lang3.CharRange class): Baseline: hashCodeObjects() (using Objects.hash(end, negated, start) – standard general-purpose implementat Optimized: hashCodeBitwise() (bitwise splicing of startendnegated 2. Key Test Results 2.1 Performance Benchmark (100 million iterations/scenario)

CharRangeTest#testHashCode_ContractCompliance
CharRangeTest#testHashCode_PerformanceComparison
===== CharRange hashCode Efficiency Comparison (Execution count: 100000000 iterations/scenario) =====
Total time for Objects.hash version: 5139 ms, accumulated result: 6360454000000000
Total time for bitwise operation version: 94 ms, accumulated result: 1087927400000000
Efficiency improvement of bitwise version over Objects.hash version: 98.17%

The bitwise implementation achieves a 98.17% reduction in execution time compared to the Objects.hash

2.2 Hash Collision Rate Test (1 million unique CharRange instances) CharRangeTest#testHashCodeCollisionRate To verify the hash distribution quality (a critical factor for hash table performance), I conducted a collision rate test with 1 million unique CharRange instances (covering normal/negated ranges, single-character ranges, and extreme value ranges). The results are as follows:

Generated 1000000 unique CharRange instances
===== HashCode Collision Rate Comparison =====
Objects.hash implementation:
  Number of unique hash codes: 989719
  Total colliding instances: 10281
  Collision rate: 1.028100%
Bitwise implementation:
  Number of unique hash codes: 999885
  Total colliding instances: 115
  Collision rate: 0.011500%

The bitwise implementation reduces the hash collision rate by approximately 98.88% (from 1.0281% to 0.0115%) compared to the Objects.hash version, representing a near 99% reduction in colliding instances and drastically improving hash distribution quality.

3. Why the Bitwise Version is Superior

  1. Extreme efficiency: Uses only 3 native bitwise operations (left shift, bitwise OR, XOR) – no temporary array allocation, autoboxing, loops, or multiplication (all sources of overhead in Objects.hash)
  2. Contract compliance: Strictly adheres to Java’s hashCode/equals rules (equal instances have identical hashes; unequal instances rarely collide).
  3. Memory-friendly: No GC overhead (unlike Objects.hash, which creates temporary Object[] arrays).

IcoreE avatar Dec 15 '25 12:12 IcoreE

@IcoreE Why is there a new PR instead of keeping #1524? Now we have to track 2 discussions?

garydgregory avatar Dec 15 '25 13:12 garydgregory

@IcoreE Why is there a new PR instead of keeping #1524? Now we have to track 2 discussions?

@garydgregory I’m so sorry for the confusion caused by the new PR – I’m a beginner with open-source PR workflows and made a misstep here. In #1524, I initially added the hashCode tests in a new CharRangeHashCodeTest.java file, but later realized it’s better to supplement these tests directly in CharRangeTest.java (to keep all CharRange tests in one place), and I have since closed PR #1524. That’s why I created the new PR, which was an incorrect approach. Apologies again for the inconvenience!

IcoreE avatar Dec 15 '25 13:12 IcoreE