fdb-record-layer DO NOT MERGE - Measure performance of pre-order implementations

DO NOT MERGE - Measure performance of pre-order implementations

Open hatyo opened this issue 1 year ago • 3 comments

This is on top of #2464 and should not be merged. It adds a JMH Benchmark that compares the performance of current pre-order implementations in TreeLike.

To run the benchmark, go to PreOrderPerformance and run main function. The benchmark generates a random TreeLike tree comprising 100 nodes with maximum of 20 nodes in a level, it then compares the two implementations of pre-order traversal:

The old implementation, which is done in TreeLike<>#inPreOrder()
The new implementation exposed through the Iterable#iterator() interface that TreeLike<> now extends from.

The benchmark creates 1 million pre-ordered lists once using the old implementation and another time using the new implementation, and compares, among other things the memory consumption.

Excerpt from the test results:

Benchmark                                             Mode  Cnt            Score         Error   Units
PreOrderPerformance.measureNew                        avgt    5       496852,082 ±   22302,728   us/op
PreOrderPerformance.measureNew:·gc.alloc.rate         avgt    5          469,146 ±      20,819  MB/sec
PreOrderPerformance.measureNew:·gc.alloc.rate.norm    avgt    5    244400053,833 ±       6,387    B/op
PreOrderPerformance.measureNew:·gc.count              avgt    5            8,000                counts
PreOrderPerformance.measureNew:·gc.time               avgt    5           36,000                    ms
PreOrderPerformance.measureOld                        avgt    5      4122177,704 ± 2476618,195   us/op
PreOrderPerformance.measureOld:·gc.alloc.rate         avgt    5         2688,688 ±    1422,556  MB/sec
PreOrderPerformance.measureOld:·gc.alloc.rate.norm    avgt    5  11428000294,400 ±     400,408    B/op
PreOrderPerformance.measureOld:·gc.count              avgt    5          106,000                counts
PreOrderPerformance.measureOld:·gc.time               avgt    5        21614,000                    ms

I think the important bit here is gc.alloc.rate which shows the rate in which the garbage collector is invoked. In the old implementation 1422,556 MB/sec compared to 20,819 MB/sec, meaning that the new implementation is reducing the allocation rate by ~98,5%.

Jan 25 '24 16:01 hatyo

fdb-record-layer fdb-record-layer copied to clipboard

DO NOT MERGE - Measure performance of pre-order implementations

fdb-record-layer
fdb-record-layer copied to clipboard