logstash Test PR to measure accuracy and performance of Event size computation

Summary

This PR is used to verify how Event memory estimation kinds varies, asked in #17736, respect to accuracy and performance. All should be compared against a byte-perfect measure, ideally considering object headers and all the details about memory layout alignment covered by JOL library. However, the JOL library when used to compute the retained size of an event also consider the references down to the JRuby runtime, like hprof (heap dump files) does when analyzed with tools like Eclipse Memory Analyzer does. So determining what's the byte-perfect real size of an Event is not so obvious.

How the test is conducted?

Test fixtures

The test consider various form of events, regarding nesting of layers fields and size of the values assigned to each field. I tested 3 sizes of values: 11 bytes, 512 and 2KB. Each event has 6 layers of nested maps with 10 elements in each node. Another test was done with 2KB payload and quite flat event (2 layers only) with 10 keys each, to understand how the measures move reducing the nesting of events. I think having 6 layers of nested values could be an unusual case for a Logstash event.

Test structure

The test is composed of 2 halves:

measure the size of events by three methods (heap dump is for reference)
benchmark the performances of the three methods to understand how they varies varying event size and structure

Each run generates an heap dump, that was opened with Eclipse memory analyzer to calculate the retained size of the single org.logstash.Event present. Also the JOL computed the retained size, which means that contains also the full JRuby runtime, because the event contains JRuby strings that has reference to the underlying JRuby classes.

Size measures results

Values are in bytes, the variation in map navigation and cbor is calculated against the raw size.

test name	raw	map navigation	map navigation (with keys)	cbor	jol (retained)	hprof(retained)
apache 1KB	983	600(-38.96%)	767(-21.97%)	1384 (40.79%)	12048416	3504
apache 2KB	2339	1865(-20.27%)	2081(-11.03%)	2776 (18.68%)	12107000	5128
apache 4KB	3057	2521(-17.53%)	2763(-9.62%)	3534 (15.60%)	12109536	6216
apache 16KB	16383	16144(-1.46%)	16257(-0.77%)	16754(2.26%)	12152984	20048
apache 32KB	32767	32528(-0.73%)	32641(-0.38%)	33154(1.18%)	12217176	38096
apache 128KB	131071	130832(-0.18%)	130945(-0.10%)	131534(0.35%)	12505896	146224
cloudTrail 1KB	1602	493(-69.23%)	893(-44.26%)	2167 (35.27%)	12116952	5368
cloudTrail 2KB	2465	730(-70.39%)	1316(-46.61%)	3152 (27.87%)	12120408	7648
cloudTrail 4KB	3078	989(-67.87%)	1766(-42.63%)	3822 (24.17%)	12122640	9200
cloudTrail 16KB	16384	15561(-5.02%)	15922(-2.82%)	17036(3.98%)	12389616	21412
cloudTrail 32KB	32768	31945(-2.51%)	32306(-1.41%)	33432(2.03%)	12407640	39440
cloudTrail 128KB	131072	130249(-0.63%)	130610(-0.35%)	131811(0.56%)	12749432	147576
snmp 1KB	856	394(-53.97%)	1730(102.10%)	1730(102.10%)	12116264	4944
snmp 2KB	1739	925(-46.81%)	3242(86.43%)	3242 (86.43%)	12119832	8656
snmp 4KB	3017	1723(-42.89%)	5389(78.62%)	5389 (78.62%)	12126184	13776
snmp 16KB	20535	11167(-45.62%)	28314(37.88%)	28314(37.88%)	12678112	73160
snmp 32KB	41125	22385(-45.57%)	56430(37.22%)	56430(37.22%)	12727432	145296
snmp 128KB	165100	89930(-45.53%)	225720(36.72%)	225720(36.72%)	13265664	579640

Calculation benchmarks

Values are ops/second (higher better), the results are in ops/microsecond except for JOL which are in ops/second

Small set of benchmark executed running for 30 seconds:

Benchmark	map navigation (ops/ms)	cbor (ops/ms)
apache 1KB	3416.043 ± 116.241 (x6.9)	496.853 ± 6.772
apache 2KB	2869.710 ± 35.520 (x8.1)	352.564 ± 4.181
apache 4KB	2553.733 ± 20.230 (x8.6)	295.903 ± 2.774
apache 16KB	1562.214 ± 15.322 (x16.5)	94.704 ± 0.648
apache 32KB	532.964 ± 10.288 (x10.0)	53.366 ± 0.575
apache 128KB	232.794 ± 6.071 (x15.8)	14.688 ± 0.194

Full set of benchmark executed running for 3 seconds:

Benchmark	map navigation (ops/ms)	cbor (ops/ms)	JOL (ops/s)
apache 1KB	3411.148 ± 269.988 (x7.0)	486.767 ± 34.517	2.341 ± 0.159
apache 2KB	2824.454 ± 191.709 (x8.1)	349.975 ± 25.439	2.230 ± 0.300
apache 4KB	2399.100 ± 166.685 (x8.3)	289.526 ± 19.217	2.312 ± 0.129
apache 16KB	1618.269 ± 66.494 (x17.0)	95.368 ± 7.417	2.328 ± 0.145
apache 32KB	547.731 ± 33.207 (x10.7)	51.898 ± 2.962	1.935 ± 0.103
apache 128KB	233.352 ± 10.044 (x16.6)	14.653 ± 0.877	2.345 ± 0.136
cloudTrail 1KB	995.575 ± 28.435 (x4.0)	245.268 ± 9.794	2.379 ± 0.138
cloudTrail 2KB	654.018 ± 32.642 (x3.3)	197.738 ± 16.743	2.347 ± 0.129
cloudTrail 4KB	604.989 ± 26.025 (x3.7)	161.719 ± 11.014	1.997 ± 0.096
cloudTrail 16KB	612.762 ± 25.644 (x6.9)	88.038 ± 6.166	2.074 ± 0.133
cloudTrail 32KB	551.232 ± 30.878 (11.2)	49.984 ± 2.780	2.152 ± 0.143
cloudTrail 128KB	258.711 ± 12.238 (x18.4)	14.476 ± 1.290	2.081 ± 0.102
snmp 1KB	1128.517 ± 33.982 (x3.6)	312.146 ± 20.351	2.118 ± 0.139
snmp 2KB	715.210 ± 34.349 (x4.2)	168.884 ± 9.136	2.315 ± 0.107
snmp 4KB	294.513 ± 84.895 (x4.7)	61.864 ± 22.254	1.373 ± 0.453
snmp 16KB	115.842 ± 9.596 (x4.8)	23.650 ± 1.413	2.456 ± 0.151
snmp 32KB	42.389 ± 5.794 (x3.5)	11.942 ± 0.707	1.708 ± 0.137
snmp 128KB	14.783 ± 1.564 (x4.8)	2.936 ± 0.202	2.391 ± 0.136

Analysis of the results

JOL and hprof provides retained size of the object graph. Hprof is not a viable solution for runtime measures and it's used only as benchmark. JOL navigate the graph more deeply and takes in a big chunk of the JRuby runtime classes (I think).
ConvertedMap custom navigation is constantly less than the real size for such small events and CBOR is constantly above the raw size. The weight of the variation is influenced by the event structure.
ConvertedMaps calculation doesn't contains the keys because are interned, and that would justify the fact that the delta against raw is constantly negative.
From a performance perspective the ConvertedMap custom navigation performs better than CBOR serialization and JOL. JOL is orders of magnitude slower than the other (measured in seconds instead of milliseconds).
Map navigation and CBOR are in the order of millions of ops per second, so doesn't provide any performance penalty for Logstash.

Closes #17736

Jun 30 '25 09:06 andsel

:robot: GitHub comments

Expand to view the GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Jun 30 '25 09:06 github-actions[bot]

This pull request does not have a backport label. Could you fix it @andsel? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

Jun 30 '25 09:06 mergify[bot]

:broken_heart: Build Failed

Buildkite Build
Commit: 9ac11d125b5a288d82130afefc6b728c20cdb0a8

Failed CI Steps

History

:broken_heart: Build #3086 failed be7d5febf357bd2a7f997b0519d7f5235682eb47
:broken_heart: Build #3084 failed d47de1aac4737da4362448b73e8cd48b6835156b
:broken_heart: Build #3081 failed ce8a19e85daad3205865171550818386b96f6e6c
:broken_heart: Build #3073 failed 9067f70b66f5194ef42b9f1058640489525597fe

Jul 08 '25 07:07 elasticmachine