fdb-record-layer
fdb-record-layer copied to clipboard
DO NOT MERGE - Measure performance of pre-order implementations
This is on top of #2464 and should not be merged. It adds a JMH Benchmark that compares the performance of current pre-order implementations in TreeLike
.
To run the benchmark, go to PreOrderPerformance
and run main
function. The benchmark generates a random TreeLike
tree comprising 100 nodes with maximum of 20 nodes in a level, it then compares the two implementations of pre-order traversal:
- The old implementation, which is done in
TreeLike<>#inPreOrder()
- The new implementation exposed through the
Iterable#iterator()
interface thatTreeLike<>
now extends from.
The benchmark creates 1 million pre-ordered lists once using the old implementation and another time using the new implementation, and compares, among other things the memory consumption.
Excerpt from the test results:
Benchmark Mode Cnt Score Error Units
PreOrderPerformance.measureNew avgt 5 496852,082 ± 22302,728 us/op
PreOrderPerformance.measureNew:·gc.alloc.rate avgt 5 469,146 ± 20,819 MB/sec
PreOrderPerformance.measureNew:·gc.alloc.rate.norm avgt 5 244400053,833 ± 6,387 B/op
PreOrderPerformance.measureNew:·gc.count avgt 5 8,000 counts
PreOrderPerformance.measureNew:·gc.time avgt 5 36,000 ms
PreOrderPerformance.measureOld avgt 5 4122177,704 ± 2476618,195 us/op
PreOrderPerformance.measureOld:·gc.alloc.rate avgt 5 2688,688 ± 1422,556 MB/sec
PreOrderPerformance.measureOld:·gc.alloc.rate.norm avgt 5 11428000294,400 ± 400,408 B/op
PreOrderPerformance.measureOld:·gc.count avgt 5 106,000 counts
PreOrderPerformance.measureOld:·gc.time avgt 5 21614,000 ms
I think the important bit here is gc.alloc.rate
which shows the rate in which the garbage collector is invoked. In the old implementation 1422,556 MB/sec compared to 20,819 MB/sec, meaning that the new implementation is reducing the allocation rate by ~98,5%.