Test PR to measure accuracy and performance of Event size computation
Summary
This PR is used to verify how Event memory estimation kinds varies, asked in #17736, respect to accuracy and performance. All should be compared against a byte-perfect measure, ideally considering object headers and all the details about memory layout alignment covered by JOL library. However, the JOL library when used to compute the retained size of an event also consider the references down to the JRuby runtime, like hprof (heap dump files) does when analyzed with tools like Eclipse Memory Analyzer does. So determining what's the byte-perfect real size of an Event is not so obvious.
How the test is conducted?
Test fixtures
The test consider various form of events, regarding nesting of layers fields and size of the values assigned to each field. I tested 3 sizes of values: 11 bytes, 512 and 2KB. Each event has 6 layers of nested maps with 10 elements in each node. Another test was done with 2KB payload and quite flat event (2 layers only) with 10 keys each, to understand how the measures move reducing the nesting of events. I think having 6 layers of nested values could be an unusual case for a Logstash event.
Test structure
The test is composed of 2 halves:
- measure the size of events by three methods (heap dump is for reference)
- benchmark the performances of the three methods to understand how they varies varying event size and structure
Each run generates an heap dump, that was opened with Eclipse memory analyzer to calculate the retained size of the single org.logstash.Event present.
Also the JOL computed the retained size, which means that contains also the full JRuby runtime, because the event contains JRuby strings that has reference to the underlying JRuby classes.
Size measures results
Values are in bytes, the variation in map navigation and cbor is calculated against the raw size.
| test name | raw | map navigation | map navigation (with keys) | cbor | jol (retained) | hprof(retained) |
|---|---|---|---|---|---|---|
| apache 1KB | 983 | 600(-38.96%) | 767(-21.97%) | 1384 (40.79%) | 12048416 | 3504 |
| apache 2KB | 2339 | 1865(-20.27%) | 2081(-11.03%) | 2776 (18.68%) | 12107000 | 5128 |
| apache 4KB | 3057 | 2521(-17.53%) | 2763(-9.62%) | 3534 (15.60%) | 12109536 | 6216 |
| apache 16KB | 16383 | 16144(-1.46%) | 16257(-0.77%) | 16754(2.26%) | 12152984 | 20048 |
| apache 32KB | 32767 | 32528(-0.73%) | 32641(-0.38%) | 33154(1.18%) | 12217176 | 38096 |
| apache 128KB | 131071 | 130832(-0.18%) | 130945(-0.10%) | 131534(0.35%) | 12505896 | 146224 |
| cloudTrail 1KB | 1602 | 493(-69.23%) | 893(-44.26%) | 2167 (35.27%) | 12116952 | 5368 |
| cloudTrail 2KB | 2465 | 730(-70.39%) | 1316(-46.61%) | 3152 (27.87%) | 12120408 | 7648 |
| cloudTrail 4KB | 3078 | 989(-67.87%) | 1766(-42.63%) | 3822 (24.17%) | 12122640 | 9200 |
| cloudTrail 16KB | 16384 | 15561(-5.02%) | 15922(-2.82%) | 17036(3.98%) | 12389616 | 21412 |
| cloudTrail 32KB | 32768 | 31945(-2.51%) | 32306(-1.41%) | 33432(2.03%) | 12407640 | 39440 |
| cloudTrail 128KB | 131072 | 130249(-0.63%) | 130610(-0.35%) | 131811(0.56%) | 12749432 | 147576 |
| snmp 1KB | 856 | 394(-53.97%) | 1730(102.10%) | 1730(102.10%) | 12116264 | 4944 |
| snmp 2KB | 1739 | 925(-46.81%) | 3242(86.43%) | 3242 (86.43%) | 12119832 | 8656 |
| snmp 4KB | 3017 | 1723(-42.89%) | 5389(78.62%) | 5389 (78.62%) | 12126184 | 13776 |
| snmp 16KB | 20535 | 11167(-45.62%) | 28314(37.88%) | 28314(37.88%) | 12678112 | 73160 |
| snmp 32KB | 41125 | 22385(-45.57%) | 56430(37.22%) | 56430(37.22%) | 12727432 | 145296 |
| snmp 128KB | 165100 | 89930(-45.53%) | 225720(36.72%) | 225720(36.72%) | 13265664 | 579640 |
Calculation benchmarks
Values are ops/second (higher better), the results are in ops/microsecond except for JOL which are in ops/second
Small set of benchmark executed running for 30 seconds:
| Benchmark | map navigation (ops/ms) | cbor (ops/ms) |
|---|---|---|
| apache 1KB | 3416.043 ± 116.241 (x6.9) | 496.853 ± 6.772 |
| apache 2KB | 2869.710 ± 35.520 (x8.1) | 352.564 ± 4.181 |
| apache 4KB | 2553.733 ± 20.230 (x8.6) | 295.903 ± 2.774 |
| apache 16KB | 1562.214 ± 15.322 (x16.5) | 94.704 ± 0.648 |
| apache 32KB | 532.964 ± 10.288 (x10.0) | 53.366 ± 0.575 |
| apache 128KB | 232.794 ± 6.071 (x15.8) | 14.688 ± 0.194 |
Full set of benchmark executed running for 3 seconds:
| Benchmark | map navigation (ops/ms) | cbor (ops/ms) | JOL (ops/s) |
|---|---|---|---|
| apache 1KB | 3411.148 ± 269.988 (x7.0) | 486.767 ± 34.517 | 2.341 ± 0.159 |
| apache 2KB | 2824.454 ± 191.709 (x8.1) | 349.975 ± 25.439 | 2.230 ± 0.300 |
| apache 4KB | 2399.100 ± 166.685 (x8.3) | 289.526 ± 19.217 | 2.312 ± 0.129 |
| apache 16KB | 1618.269 ± 66.494 (x17.0) | 95.368 ± 7.417 | 2.328 ± 0.145 |
| apache 32KB | 547.731 ± 33.207 (x10.7) | 51.898 ± 2.962 | 1.935 ± 0.103 |
| apache 128KB | 233.352 ± 10.044 (x16.6) | 14.653 ± 0.877 | 2.345 ± 0.136 |
| cloudTrail 1KB | 995.575 ± 28.435 (x4.0) | 245.268 ± 9.794 | 2.379 ± 0.138 |
| cloudTrail 2KB | 654.018 ± 32.642 (x3.3) | 197.738 ± 16.743 | 2.347 ± 0.129 |
| cloudTrail 4KB | 604.989 ± 26.025 (x3.7) | 161.719 ± 11.014 | 1.997 ± 0.096 |
| cloudTrail 16KB | 612.762 ± 25.644 (x6.9) | 88.038 ± 6.166 | 2.074 ± 0.133 |
| cloudTrail 32KB | 551.232 ± 30.878 (11.2) | 49.984 ± 2.780 | 2.152 ± 0.143 |
| cloudTrail 128KB | 258.711 ± 12.238 (x18.4) | 14.476 ± 1.290 | 2.081 ± 0.102 |
| snmp 1KB | 1128.517 ± 33.982 (x3.6) | 312.146 ± 20.351 | 2.118 ± 0.139 |
| snmp 2KB | 715.210 ± 34.349 (x4.2) | 168.884 ± 9.136 | 2.315 ± 0.107 |
| snmp 4KB | 294.513 ± 84.895 (x4.7) | 61.864 ± 22.254 | 1.373 ± 0.453 |
| snmp 16KB | 115.842 ± 9.596 (x4.8) | 23.650 ± 1.413 | 2.456 ± 0.151 |
| snmp 32KB | 42.389 ± 5.794 (x3.5) | 11.942 ± 0.707 | 1.708 ± 0.137 |
| snmp 128KB | 14.783 ± 1.564 (x4.8) | 2.936 ± 0.202 | 2.391 ± 0.136 |
Analysis of the results
- JOL and hprof provides retained size of the object graph. Hprof is not a viable solution for runtime measures and it's used only as benchmark. JOL navigate the graph more deeply and takes in a big chunk of the JRuby runtime classes (I think).
- ConvertedMap custom navigation is constantly less than the real size for such small events and CBOR is constantly above the raw size. The weight of the variation is influenced by the event structure.
- ConvertedMaps calculation doesn't contains the keys because are interned, and that would justify the fact that the delta against raw is constantly negative.
- From a performance perspective the ConvertedMap custom navigation performs better than CBOR serialization and JOL. JOL is orders of magnitude slower than the other (measured in seconds instead of milliseconds).
- Map navigation and CBOR are in the order of millions of ops per second, so doesn't provide any performance penalty for Logstash.
- Closes #17736
:robot: GitHub comments
Expand to view the GitHub comments
Just comment with:
rundocs-build: Re-trigger the docs validation. (use unformatted text in the comment!)
This pull request does not have a backport label. Could you fix it @andsel? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:
backport-8./dis the label to automatically backport to the8./dbranch./dis the digit.- If no backport is necessary, please add the
backport-skiplabel
:broken_heart: Build Failed
- Buildkite Build
- Commit: 9ac11d125b5a288d82130afefc6b728c20cdb0a8
Failed CI Steps
- :rspec: Ruby unit tests
- :rspec: Ruby unit tests
- :rspec: Ruby unit tests
- :rspec: Ruby unit tests
- :java: Java unit tests
- :java: Java unit tests
- :java: Java unit tests
- :java: Java unit tests
History
- :broken_heart: Build #3086 failed be7d5febf357bd2a7f997b0519d7f5235682eb47
- :broken_heart: Build #3084 failed d47de1aac4737da4362448b73e8cd48b6835156b
- :broken_heart: Build #3081 failed ce8a19e85daad3205865171550818386b96f6e6c
- :broken_heart: Build #3073 failed 9067f70b66f5194ef42b9f1058640489525597fe