jamm Double the performance of measuring collections

This PR makes measuring collections (deeply) 2.3 times faster on average (across all strategies):

INSTRUMENTATION:

ArrayList: 1.6X faster
HashMap: 1.9X faster

INSTRUMENTATION_AND_SPECIFICATION:

ArrayList: 2X faster
HashMap: 2X faster

UNSAFE:

ArrayList: 2.2X faster
HashMap: 3X faster

SPECIFICATION:

ArrayList: 2.2X faster
HashMap: 3.7X faster

The performance of other workloads such as measuring small objects remains unchanged, within the margin of error, as the setup for the new optimization is negligible compared to the effort of measuring them. I'll add another standalone comment with JMH benchmark results.

Approach: Create a temporary cache every time the measureDeep function is called to cache class metadata (shallow size and the fields to be measured). When we encounter another object of the same type within the object graph, instead of re-computing the shallow size and re-traversing the inheritance hierarchy of that object to discover the declared fields and then check each field against the field filter, we can skip all that and use the results from the cache.

Since this temporary local cache only exists within the scope of the measureDeep method, there should be no threading concerns or concerns about the cache growing unbounded over multiple invocations of the measureDeep method since the cache is discarded when the method returns.

This resolves https://github.com/jbellis/jamm/issues/65

Oct 31 '23 07:10 daniel-rusu

JMH Benchmarks: JMH version: 1.36 VM version: JDK 1.8.0_392, OpenJDK 64-Bit Server VM, 25.392-b08

Notice how the BenchmarkMeasureCollections times improved dramatically along with even tighter margins of error for these scenarios after the optimizations.

Before optimizations

Benchmark                                                                         (guess)  Mode  Cnt       Score      Error  Units
BenchmarkMeasureArray.measure                                             INSTRUMENTATION  avgt    5      43.442 �    0.260  us/op
BenchmarkMeasureArray.measure                           INSTRUMENTATION_AND_SPECIFICATION  avgt    5       3.319 �    0.046  us/op
BenchmarkMeasureArray.measure                                               SPECIFICATION  avgt    5       3.317 �    0.025  us/op
BenchmarkMeasureArray.measure                                                      UNSAFE  avgt    5       3.369 �    0.101  us/op
BenchmarkMeasureArray.measureByteArray                                    INSTRUMENTATION  avgt    5      43.700 �    0.627  us/op
BenchmarkMeasureArray.measureByteArray                  INSTRUMENTATION_AND_SPECIFICATION  avgt    5       3.178 �    0.282  us/op
BenchmarkMeasureArray.measureByteArray                                      SPECIFICATION  avgt    5       3.130 �    0.225  us/op
BenchmarkMeasureArray.measureByteArray                                             UNSAFE  avgt    5       3.103 �    0.130  us/op
BenchmarkMeasureCollections.measureArrayListDeep                          INSTRUMENTATION  avgt    5   61603.071 �  588.509  us/op
BenchmarkMeasureCollections.measureArrayListDeep        INSTRUMENTATION_AND_SPECIFICATION  avgt    5   52686.509 �  502.536  us/op
BenchmarkMeasureCollections.measureArrayListDeep                            SPECIFICATION  avgt    5   59296.212 �  652.282  us/op
BenchmarkMeasureCollections.measureArrayListDeep                                   UNSAFE  avgt    5   58910.304 �  928.203  us/op
BenchmarkMeasureCollections.measureHashMapDeep                            INSTRUMENTATION  avgt    5   94641.935 �  895.857  us/op
BenchmarkMeasureCollections.measureHashMapDeep          INSTRUMENTATION_AND_SPECIFICATION  avgt    5   79820.642 �  813.532  us/op
BenchmarkMeasureCollections.measureHashMapDeep                              SPECIFICATION  avgt    5  149428.673 � 7604.924  us/op
BenchmarkMeasureCollections.measureHashMapDeep                                     UNSAFE  avgt    5  119226.557 � 1064.464  us/op
BenchmarkMeasureInstance.measure                                          INSTRUMENTATION  avgt    5      43.997 �    0.979  us/op
BenchmarkMeasureInstance.measure                        INSTRUMENTATION_AND_SPECIFICATION  avgt    5      43.026 �    0.477  us/op
BenchmarkMeasureInstance.measure                                            SPECIFICATION  avgt    5      70.504 �    1.467  us/op
BenchmarkMeasureInstance.measure                                                   UNSAFE  avgt    5      71.571 �    0.710  us/op
BenchmarkMeasureString.measureDeep                                        INSTRUMENTATION  avgt    5     115.894 �    2.163  us/op
BenchmarkMeasureString.measureDeep                      INSTRUMENTATION_AND_SPECIFICATION  avgt    5      98.576 �    1.663  us/op
BenchmarkMeasureString.measureDeep                                                 UNSAFE  avgt    5      94.347 �    3.175  us/op
BenchmarkMeasureString.measureDeep                                          SPECIFICATION  avgt    5      93.773 �    0.349  us/op
BenchmarkMeasureString.measureStringDeep                                  INSTRUMENTATION  avgt    5      45.072 �    0.666  us/op
BenchmarkMeasureString.measureStringDeep                INSTRUMENTATION_AND_SPECIFICATION  avgt    5       3.393 �    0.073  us/op
BenchmarkMeasureString.measureStringDeep                                           UNSAFE  avgt    5       3.418 �    0.061  us/op
BenchmarkMeasureString.measureStringDeep                                    SPECIFICATION  avgt    5       3.386 �    0.063  us/op
BenchmarkObjectGraphTraversal.measureThroughMeasurable                                N/A  avgt    5       0.490 �    0.010  us/op
BenchmarkObjectGraphTraversal.measureThroughReflection                                N/A  avgt    5       1.563 �    0.010  us/op

After optimizations:

Benchmark                                                                         (guess)  Mode  Cnt      Score     Error  Units
BenchmarkMeasureArray.measure                                             INSTRUMENTATION  avgt    5     43.778 �   0.323  us/op
BenchmarkMeasureArray.measure                           INSTRUMENTATION_AND_SPECIFICATION  avgt    5      3.309 �   0.014  us/op
BenchmarkMeasureArray.measure                                               SPECIFICATION  avgt    5      3.261 �   0.078  us/op
BenchmarkMeasureArray.measure                                                      UNSAFE  avgt    5      3.315 �   0.117  us/op
BenchmarkMeasureArray.measureByteArray                                    INSTRUMENTATION  avgt    5     44.597 �   1.500  us/op
BenchmarkMeasureArray.measureByteArray                  INSTRUMENTATION_AND_SPECIFICATION  avgt    5      3.128 �   0.067  us/op
BenchmarkMeasureArray.measureByteArray                                      SPECIFICATION  avgt    5      3.156 �   0.121  us/op
BenchmarkMeasureArray.measureByteArray                                             UNSAFE  avgt    5      3.111 �   0.087  us/op
BenchmarkMeasureCollections.measureArrayListDeep                          INSTRUMENTATION  avgt    5  37180.371 � 473.874  us/op
BenchmarkMeasureCollections.measureArrayListDeep        INSTRUMENTATION_AND_SPECIFICATION  avgt    5  26504.371 �  69.736  us/op
BenchmarkMeasureCollections.measureArrayListDeep                            SPECIFICATION  avgt    5  26113.490 � 120.462  us/op
BenchmarkMeasureCollections.measureArrayListDeep                                   UNSAFE  avgt    5  26109.842 � 182.293  us/op
BenchmarkMeasureCollections.measureHashMapDeep                            INSTRUMENTATION  avgt    5  49822.571 � 422.441  us/op
BenchmarkMeasureCollections.measureHashMapDeep          INSTRUMENTATION_AND_SPECIFICATION  avgt    5  39136.548 � 115.379  us/op
BenchmarkMeasureCollections.measureHashMapDeep                              SPECIFICATION  avgt    5  40174.768 � 189.955  us/op
BenchmarkMeasureCollections.measureHashMapDeep                                     UNSAFE  avgt    5  39296.766 � 239.355  us/op
BenchmarkMeasureInstance.measure                                          INSTRUMENTATION  avgt    5     42.704 �   0.770  us/op
BenchmarkMeasureInstance.measure                        INSTRUMENTATION_AND_SPECIFICATION  avgt    5     42.924 �   0.803  us/op
BenchmarkMeasureInstance.measure                                            SPECIFICATION  avgt    5     70.119 �   1.046  us/op
BenchmarkMeasureInstance.measure                                                   UNSAFE  avgt    5     70.582 �   1.179  us/op
BenchmarkMeasureString.measureDeep                                        INSTRUMENTATION  avgt    5    118.157 �   2.244  us/op
BenchmarkMeasureString.measureDeep                      INSTRUMENTATION_AND_SPECIFICATION  avgt    5     94.359 �   0.218  us/op
BenchmarkMeasureString.measureDeep                                                 UNSAFE  avgt    5     95.071 �   1.278  us/op
BenchmarkMeasureString.measureDeep                                          SPECIFICATION  avgt    5     96.658 �   2.656  us/op
BenchmarkMeasureString.measureStringDeep                                  INSTRUMENTATION  avgt    5     45.014 �   0.843  us/op
BenchmarkMeasureString.measureStringDeep                INSTRUMENTATION_AND_SPECIFICATION  avgt    5      5.547 �   7.695  us/op
BenchmarkMeasureString.measureStringDeep                                           UNSAFE  avgt    5      3.391 �   0.030  us/op
BenchmarkMeasureString.measureStringDeep                                    SPECIFICATION  avgt    5      3.402 �   0.085  us/op
BenchmarkObjectGraphTraversal.measureThroughMeasurable                                N/A  avgt    5      0.513 �   0.005  us/op
BenchmarkObjectGraphTraversal.measureThroughReflection                                N/A  avgt    5      1.571 �   0.018  us/op

Oct 31 '23 07:10 daniel-rusu

@daniel-rusu Thanks for the patch. I will be curious to see the JMH result when running the benchmark with Java 17 as the numbers in general differ a lot from Java 8. The benchmark is also targetting large collections with 1000 elements. For most applications, I would expect collections to be significantly smaller something between 0 and 100 elements. Using fix collection size in a benchmark also allow the JVM to do some optimisation that it will not be able to do with real collections. The benchmark should use collection of different size: something like 70% between 0 and 100, 20% between 100 and 700 and 10% between 700 and 1000.

Regarding the approach, it is heavily targeted toward collection only measurements. If you measure a deep tree of object, I expect that creating the cache will at some point impact negatively the measurements performance. Specially if you have multiple threads doing similar types of measurements at the same time.

Nov 01 '23 14:11 blerer

@daniel-rusu Thanks for the patch. I will be curious to see the JMH result when running the benchmark with Java 17 as the numbers in general differ a lot from Java 8. The benchmark is also targetting large collections with 1000 elements. For most applications, I would expect collections to be significantly smaller something between 0 and 100 elements. Using fix collection size in a benchmark also allow the JVM to do some optimisation that it will not be able to do with real collections. The benchmark should use collection of different size: something like 70% between 0 and 100, 20% between 100 and 700 and 10% between 700 and 1000.

Regarding the approach, it is heavily targeted toward collection only measurements. If you measure a deep tree of object, I expect that creating the cache will at some point impact negatively the measurements performance. Specially if you have multiple threads doing similar types of measurements at the same time.

@blerer thanks for the feedback. I added another commit so that each collection has a random size that follows your suggested distribution:

70% between 0 to 100 elements
20% between 100 to 700 elements
10% between 700 to 1000 elements

I also added a linked list benchmark to see how it performs with deeply nested data since measuring that will need to traverse all the links to get to the end of the collection.

The updated performance improvements with randomized collection sizes are:

INSTRUMENTATION:

ArrayList: 1.6X faster
HashMap: 1.6X faster
LinkedList: 1.8X faster

INSTRUMENTATION_AND_SPECIFICATION:

ArrayList: 1.9X faster
HashMap: 2.1X faster
LinkedList: 2.1X faster

UNSAFE:

ArrayList: 2.2X faster
HashMap: 2.2X faster
LinkedList: 2.4X faster

SPECIFICATION:

ArrayList: 2.2X faster
HashMap: 2.3X faster
LinkedList: 2.4X faster

Interestingly, looking at the performance progression going from array list to hashmap to linked list, the performance improvement seems to increase as the object graph becomes more deeply nested.

I wasn't able to run the JMH benchmarks with Java 17 (or 11) as triggering the benchmark recompiles the project and that failed due to a bunch of compilation errors about unrecognized symbols. Since the caching logic still performs the same actions the first time but with slight changes to store the results in a map, the real overhead is in creating these container objects and inserting them in the map as everything else is the same work as before when encountering a new type (and faster when encountering another instance of the same type). Since creating objects on the JVM is extremely cheap (usually just incrementing a pointer in the free space based on the amount of memory requested), I suspect that the caching overhead is negligible. This reasoning seems to align with the initial benchmarks above for the pre-existing benchmarks where caching isn't beneficial which resulted in unaffected measurements (within the margin of error). So I expect that Java 17 will improve both versions (before & after optimizations) by the same percentage since caching seems to be a negligible part of the runtime.

Lastly, the actual results: VM version: JDK 1.8.0_392, OpenJDK 64-Bit Server VM, 25.392-b08

Before optimizations:

BenchmarkMeasureCollections.measureArrayListDeep                     INSTRUMENTATION  avgt    5  10541.258 �  424.504  us/op
BenchmarkMeasureCollections.measureArrayListDeep   INSTRUMENTATION_AND_SPECIFICATION  avgt    5   8929.104 �  147.143  us/op
BenchmarkMeasureCollections.measureArrayListDeep                       SPECIFICATION  avgt    5  10208.275 �  211.971  us/op
BenchmarkMeasureCollections.measureArrayListDeep                              UNSAFE  avgt    5  10212.607 �   84.314  us/op
BenchmarkMeasureCollections.measureHashMapDeep                       INSTRUMENTATION  avgt    5  15687.822 �  362.651  us/op
BenchmarkMeasureCollections.measureHashMapDeep     INSTRUMENTATION_AND_SPECIFICATION  avgt    5  15629.262 � 3977.612  us/op
BenchmarkMeasureCollections.measureHashMapDeep                         SPECIFICATION  avgt    5  16561.246 �  203.542  us/op
BenchmarkMeasureCollections.measureHashMapDeep                                UNSAFE  avgt    5  16728.401 �  260.310  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                    INSTRUMENTATION  avgt    5  15136.675 �  290.159  us/op
BenchmarkMeasureCollections.measureLinkedListDeep  INSTRUMENTATION_AND_SPECIFICATION  avgt    5  12841.208 �  230.984  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                      SPECIFICATION  avgt    5  14743.332 �  194.358  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                             UNSAFE  avgt    5  14505.943 �  222.238  us/op

After optimizations

Benchmark                                                                    (guess)  Mode  Cnt     Score     Error  Units
BenchmarkMeasureCollections.measureArrayListDeep                     INSTRUMENTATION  avgt    5  6270.412 � 152.350  us/op
BenchmarkMeasureCollections.measureArrayListDeep   INSTRUMENTATION_AND_SPECIFICATION  avgt    5  4616.823 �  63.404  us/op
BenchmarkMeasureCollections.measureArrayListDeep                       SPECIFICATION  avgt    5  4558.527 �  54.508  us/op
BenchmarkMeasureCollections.measureArrayListDeep                              UNSAFE  avgt    5  4520.083 �  34.736  us/op
BenchmarkMeasureCollections.measureHashMapDeep                       INSTRUMENTATION  avgt    5  9280.512 � 127.181  us/op
BenchmarkMeasureCollections.measureHashMapDeep     INSTRUMENTATION_AND_SPECIFICATION  avgt    5  7141.111 � 127.815  us/op
BenchmarkMeasureCollections.measureHashMapDeep                         SPECIFICATION  avgt    5  7195.064 � 140.975  us/op
BenchmarkMeasureCollections.measureHashMapDeep                                UNSAFE  avgt    5  7354.924 � 161.352  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                    INSTRUMENTATION  avgt    5  8041.984 � 139.366  us/op
BenchmarkMeasureCollections.measureLinkedListDeep  INSTRUMENTATION_AND_SPECIFICATION  avgt    5  6049.960 �  75.066  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                      SPECIFICATION  avgt    5  6066.878 � 107.686  us/op
BenchmarkMeasureCollections.measureLinkedListDeep                             UNSAFE  avgt    5  5983.891 �  62.097  us/op

Note that I only re-ran the collection benchmarks as the others are unchanged

Nov 02 '23 04:11 daniel-rusu

@blerer I'm curious if I can do anything to help get this merged as It's been a couple of months.

Since it only compiles on JDK 8, I'm not sure if it's possible to run the JMH benchmarks on JDK 17 but I'm open to suggestions. While JDK 17 will have better absolute performance, I expect these changes to improve the performance on JDK 17 by the same ratio.

Do you have any suggestions to improve this PR further?

Jan 03 '24 19:01 daniel-rusu

Sorry for the delay, @daniel-rusu. I have been busy elsewhere. I will try to get some time to look into the patch in 2 weeks as I need to finish some other stuff first.

Jan 08 '24 12:01 blerer