Double the performance of measuring collections
This PR makes measuring collections (deeply) 2.3 times faster on average (across all strategies):
INSTRUMENTATION:
- ArrayList: 1.6X faster
- HashMap: 1.9X faster
INSTRUMENTATION_AND_SPECIFICATION:
- ArrayList: 2X faster
- HashMap: 2X faster
UNSAFE:
- ArrayList: 2.2X faster
- HashMap: 3X faster
SPECIFICATION:
- ArrayList: 2.2X faster
- HashMap: 3.7X faster
The performance of other workloads such as measuring small objects remains unchanged, within the margin of error, as the setup for the new optimization is negligible compared to the effort of measuring them. I'll add another standalone comment with JMH benchmark results.
Approach: Create a temporary cache every time the measureDeep function is called to cache class metadata (shallow size and the fields to be measured). When we encounter another object of the same type within the object graph, instead of re-computing the shallow size and re-traversing the inheritance hierarchy of that object to discover the declared fields and then check each field against the field filter, we can skip all that and use the results from the cache.
Since this temporary local cache only exists within the scope of the measureDeep method, there should be no threading concerns or concerns about the cache growing unbounded over multiple invocations of the measureDeep method since the cache is discarded when the method returns.
This resolves https://github.com/jbellis/jamm/issues/65
JMH Benchmarks: JMH version: 1.36 VM version: JDK 1.8.0_392, OpenJDK 64-Bit Server VM, 25.392-b08
Notice how the BenchmarkMeasureCollections times improved dramatically along with even tighter margins of error for these scenarios after the optimizations.
Before optimizations
Benchmark (guess) Mode Cnt Score Error Units
BenchmarkMeasureArray.measure INSTRUMENTATION avgt 5 43.442 � 0.260 us/op
BenchmarkMeasureArray.measure INSTRUMENTATION_AND_SPECIFICATION avgt 5 3.319 � 0.046 us/op
BenchmarkMeasureArray.measure SPECIFICATION avgt 5 3.317 � 0.025 us/op
BenchmarkMeasureArray.measure UNSAFE avgt 5 3.369 � 0.101 us/op
BenchmarkMeasureArray.measureByteArray INSTRUMENTATION avgt 5 43.700 � 0.627 us/op
BenchmarkMeasureArray.measureByteArray INSTRUMENTATION_AND_SPECIFICATION avgt 5 3.178 � 0.282 us/op
BenchmarkMeasureArray.measureByteArray SPECIFICATION avgt 5 3.130 � 0.225 us/op
BenchmarkMeasureArray.measureByteArray UNSAFE avgt 5 3.103 � 0.130 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION avgt 5 61603.071 � 588.509 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 52686.509 � 502.536 us/op
BenchmarkMeasureCollections.measureArrayListDeep SPECIFICATION avgt 5 59296.212 � 652.282 us/op
BenchmarkMeasureCollections.measureArrayListDeep UNSAFE avgt 5 58910.304 � 928.203 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION avgt 5 94641.935 � 895.857 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 79820.642 � 813.532 us/op
BenchmarkMeasureCollections.measureHashMapDeep SPECIFICATION avgt 5 149428.673 � 7604.924 us/op
BenchmarkMeasureCollections.measureHashMapDeep UNSAFE avgt 5 119226.557 � 1064.464 us/op
BenchmarkMeasureInstance.measure INSTRUMENTATION avgt 5 43.997 � 0.979 us/op
BenchmarkMeasureInstance.measure INSTRUMENTATION_AND_SPECIFICATION avgt 5 43.026 � 0.477 us/op
BenchmarkMeasureInstance.measure SPECIFICATION avgt 5 70.504 � 1.467 us/op
BenchmarkMeasureInstance.measure UNSAFE avgt 5 71.571 � 0.710 us/op
BenchmarkMeasureString.measureDeep INSTRUMENTATION avgt 5 115.894 � 2.163 us/op
BenchmarkMeasureString.measureDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 98.576 � 1.663 us/op
BenchmarkMeasureString.measureDeep UNSAFE avgt 5 94.347 � 3.175 us/op
BenchmarkMeasureString.measureDeep SPECIFICATION avgt 5 93.773 � 0.349 us/op
BenchmarkMeasureString.measureStringDeep INSTRUMENTATION avgt 5 45.072 � 0.666 us/op
BenchmarkMeasureString.measureStringDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 3.393 � 0.073 us/op
BenchmarkMeasureString.measureStringDeep UNSAFE avgt 5 3.418 � 0.061 us/op
BenchmarkMeasureString.measureStringDeep SPECIFICATION avgt 5 3.386 � 0.063 us/op
BenchmarkObjectGraphTraversal.measureThroughMeasurable N/A avgt 5 0.490 � 0.010 us/op
BenchmarkObjectGraphTraversal.measureThroughReflection N/A avgt 5 1.563 � 0.010 us/op
After optimizations:
Benchmark (guess) Mode Cnt Score Error Units
BenchmarkMeasureArray.measure INSTRUMENTATION avgt 5 43.778 � 0.323 us/op
BenchmarkMeasureArray.measure INSTRUMENTATION_AND_SPECIFICATION avgt 5 3.309 � 0.014 us/op
BenchmarkMeasureArray.measure SPECIFICATION avgt 5 3.261 � 0.078 us/op
BenchmarkMeasureArray.measure UNSAFE avgt 5 3.315 � 0.117 us/op
BenchmarkMeasureArray.measureByteArray INSTRUMENTATION avgt 5 44.597 � 1.500 us/op
BenchmarkMeasureArray.measureByteArray INSTRUMENTATION_AND_SPECIFICATION avgt 5 3.128 � 0.067 us/op
BenchmarkMeasureArray.measureByteArray SPECIFICATION avgt 5 3.156 � 0.121 us/op
BenchmarkMeasureArray.measureByteArray UNSAFE avgt 5 3.111 � 0.087 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION avgt 5 37180.371 � 473.874 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 26504.371 � 69.736 us/op
BenchmarkMeasureCollections.measureArrayListDeep SPECIFICATION avgt 5 26113.490 � 120.462 us/op
BenchmarkMeasureCollections.measureArrayListDeep UNSAFE avgt 5 26109.842 � 182.293 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION avgt 5 49822.571 � 422.441 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 39136.548 � 115.379 us/op
BenchmarkMeasureCollections.measureHashMapDeep SPECIFICATION avgt 5 40174.768 � 189.955 us/op
BenchmarkMeasureCollections.measureHashMapDeep UNSAFE avgt 5 39296.766 � 239.355 us/op
BenchmarkMeasureInstance.measure INSTRUMENTATION avgt 5 42.704 � 0.770 us/op
BenchmarkMeasureInstance.measure INSTRUMENTATION_AND_SPECIFICATION avgt 5 42.924 � 0.803 us/op
BenchmarkMeasureInstance.measure SPECIFICATION avgt 5 70.119 � 1.046 us/op
BenchmarkMeasureInstance.measure UNSAFE avgt 5 70.582 � 1.179 us/op
BenchmarkMeasureString.measureDeep INSTRUMENTATION avgt 5 118.157 � 2.244 us/op
BenchmarkMeasureString.measureDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 94.359 � 0.218 us/op
BenchmarkMeasureString.measureDeep UNSAFE avgt 5 95.071 � 1.278 us/op
BenchmarkMeasureString.measureDeep SPECIFICATION avgt 5 96.658 � 2.656 us/op
BenchmarkMeasureString.measureStringDeep INSTRUMENTATION avgt 5 45.014 � 0.843 us/op
BenchmarkMeasureString.measureStringDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 5.547 � 7.695 us/op
BenchmarkMeasureString.measureStringDeep UNSAFE avgt 5 3.391 � 0.030 us/op
BenchmarkMeasureString.measureStringDeep SPECIFICATION avgt 5 3.402 � 0.085 us/op
BenchmarkObjectGraphTraversal.measureThroughMeasurable N/A avgt 5 0.513 � 0.005 us/op
BenchmarkObjectGraphTraversal.measureThroughReflection N/A avgt 5 1.571 � 0.018 us/op
@daniel-rusu Thanks for the patch. I will be curious to see the JMH result when running the benchmark with Java 17 as the numbers in general differ a lot from Java 8. The benchmark is also targetting large collections with 1000 elements. For most applications, I would expect collections to be significantly smaller something between 0 and 100 elements. Using fix collection size in a benchmark also allow the JVM to do some optimisation that it will not be able to do with real collections. The benchmark should use collection of different size: something like 70% between 0 and 100, 20% between 100 and 700 and 10% between 700 and 1000.
Regarding the approach, it is heavily targeted toward collection only measurements. If you measure a deep tree of object, I expect that creating the cache will at some point impact negatively the measurements performance. Specially if you have multiple threads doing similar types of measurements at the same time.
@daniel-rusu Thanks for the patch. I will be curious to see the JMH result when running the benchmark with Java 17 as the numbers in general differ a lot from Java 8. The benchmark is also targetting large collections with 1000 elements. For most applications, I would expect collections to be significantly smaller something between 0 and 100 elements. Using fix collection size in a benchmark also allow the JVM to do some optimisation that it will not be able to do with real collections. The benchmark should use collection of different size: something like 70% between 0 and 100, 20% between 100 and 700 and 10% between 700 and 1000.
Regarding the approach, it is heavily targeted toward collection only measurements. If you measure a deep tree of object, I expect that creating the cache will at some point impact negatively the measurements performance. Specially if you have multiple threads doing similar types of measurements at the same time.
@blerer thanks for the feedback. I added another commit so that each collection has a random size that follows your suggested distribution:
- 70% between 0 to 100 elements
- 20% between 100 to 700 elements
- 10% between 700 to 1000 elements
I also added a linked list benchmark to see how it performs with deeply nested data since measuring that will need to traverse all the links to get to the end of the collection.
The updated performance improvements with randomized collection sizes are:
INSTRUMENTATION:
- ArrayList: 1.6X faster
- HashMap: 1.6X faster
- LinkedList: 1.8X faster
INSTRUMENTATION_AND_SPECIFICATION:
- ArrayList: 1.9X faster
- HashMap: 2.1X faster
- LinkedList: 2.1X faster
UNSAFE:
- ArrayList: 2.2X faster
- HashMap: 2.2X faster
- LinkedList: 2.4X faster
SPECIFICATION:
- ArrayList: 2.2X faster
- HashMap: 2.3X faster
- LinkedList: 2.4X faster
Interestingly, looking at the performance progression going from array list to hashmap to linked list, the performance improvement seems to increase as the object graph becomes more deeply nested.
I wasn't able to run the JMH benchmarks with Java 17 (or 11) as triggering the benchmark recompiles the project and that failed due to a bunch of compilation errors about unrecognized symbols. Since the caching logic still performs the same actions the first time but with slight changes to store the results in a map, the real overhead is in creating these container objects and inserting them in the map as everything else is the same work as before when encountering a new type (and faster when encountering another instance of the same type). Since creating objects on the JVM is extremely cheap (usually just incrementing a pointer in the free space based on the amount of memory requested), I suspect that the caching overhead is negligible. This reasoning seems to align with the initial benchmarks above for the pre-existing benchmarks where caching isn't beneficial which resulted in unaffected measurements (within the margin of error). So I expect that Java 17 will improve both versions (before & after optimizations) by the same percentage since caching seems to be a negligible part of the runtime.
Lastly, the actual results: VM version: JDK 1.8.0_392, OpenJDK 64-Bit Server VM, 25.392-b08
Before optimizations:
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION avgt 5 10541.258 � 424.504 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 8929.104 � 147.143 us/op
BenchmarkMeasureCollections.measureArrayListDeep SPECIFICATION avgt 5 10208.275 � 211.971 us/op
BenchmarkMeasureCollections.measureArrayListDeep UNSAFE avgt 5 10212.607 � 84.314 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION avgt 5 15687.822 � 362.651 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 15629.262 � 3977.612 us/op
BenchmarkMeasureCollections.measureHashMapDeep SPECIFICATION avgt 5 16561.246 � 203.542 us/op
BenchmarkMeasureCollections.measureHashMapDeep UNSAFE avgt 5 16728.401 � 260.310 us/op
BenchmarkMeasureCollections.measureLinkedListDeep INSTRUMENTATION avgt 5 15136.675 � 290.159 us/op
BenchmarkMeasureCollections.measureLinkedListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 12841.208 � 230.984 us/op
BenchmarkMeasureCollections.measureLinkedListDeep SPECIFICATION avgt 5 14743.332 � 194.358 us/op
BenchmarkMeasureCollections.measureLinkedListDeep UNSAFE avgt 5 14505.943 � 222.238 us/op
After optimizations
Benchmark (guess) Mode Cnt Score Error Units
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION avgt 5 6270.412 � 152.350 us/op
BenchmarkMeasureCollections.measureArrayListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 4616.823 � 63.404 us/op
BenchmarkMeasureCollections.measureArrayListDeep SPECIFICATION avgt 5 4558.527 � 54.508 us/op
BenchmarkMeasureCollections.measureArrayListDeep UNSAFE avgt 5 4520.083 � 34.736 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION avgt 5 9280.512 � 127.181 us/op
BenchmarkMeasureCollections.measureHashMapDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 7141.111 � 127.815 us/op
BenchmarkMeasureCollections.measureHashMapDeep SPECIFICATION avgt 5 7195.064 � 140.975 us/op
BenchmarkMeasureCollections.measureHashMapDeep UNSAFE avgt 5 7354.924 � 161.352 us/op
BenchmarkMeasureCollections.measureLinkedListDeep INSTRUMENTATION avgt 5 8041.984 � 139.366 us/op
BenchmarkMeasureCollections.measureLinkedListDeep INSTRUMENTATION_AND_SPECIFICATION avgt 5 6049.960 � 75.066 us/op
BenchmarkMeasureCollections.measureLinkedListDeep SPECIFICATION avgt 5 6066.878 � 107.686 us/op
BenchmarkMeasureCollections.measureLinkedListDeep UNSAFE avgt 5 5983.891 � 62.097 us/op
Note that I only re-ran the collection benchmarks as the others are unchanged
@blerer I'm curious if I can do anything to help get this merged as It's been a couple of months.
Since it only compiles on JDK 8, I'm not sure if it's possible to run the JMH benchmarks on JDK 17 but I'm open to suggestions. While JDK 17 will have better absolute performance, I expect these changes to improve the performance on JDK 17 by the same ratio.
Do you have any suggestions to improve this PR further?
Sorry for the delay, @daniel-rusu. I have been busy elsewhere. I will try to get some time to look into the patch in 2 weeks as I need to finish some other stuff first.