opentelemetry-proto
opentelemetry-proto copied to clipboard
Benchmark and speed up metric proto while maintaining semantics
Recent changes to metric messages resulted in ~significant~ some speed reduction compared to earlier versions. Here is comparision between 0.4 and current latest:
BenchmarkEncode/OTLP_0.4/Metric/Int64-8 3368 1767491 ns/op 183888 B/op 9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8 2877 2088807 ns/op 207888 B/op 10946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Summary-8 9207 657781 ns/op 62800 B/op 3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8 9018 653362 ns/op 57360 B/op 2946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Histogram-8 6349 920041 ns/op 86288 B/op 4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8 7162 828377 ns/op 74704 B/op 3646 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8 1630 3739928 ns/op 352273 B/op 18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8 1880 3183887 ns/op 296657 B/op 14446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Mix-8 1812 3314394 ns/op 331665 B/op 17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8 1651 3906334 ns/op 336465 B/op 17446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8 415 14762737 ns/op 1400724 B/op 74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8 360 16274769 ns/op 1441106 B/op 76246 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Int64-8 9387 643925 ns/op 201489 B/op 7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8 8712 677251 ns/op 210289 B/op 7146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Summary-8 26812 227705 ns/op 75888 B/op 2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8 28062 209540 ns/op 63072 B/op 2146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Histogram-8 20109 307286 ns/op 99872 B/op 2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8 21282 268512 ns/op 87872 B/op 2846 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8 4790 1249235 ns/op 367076 B/op 12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8 5763 1009533 ns/op 335877 B/op 11146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Mix-8 4977 1157008 ns/op 377235 B/op 12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8 4776 1158076 ns/op 361218 B/op 12048 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8 1231 4880358 ns/op 1484439 B/op 51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8 1333 4635400 ns/op 1484425 B/op 50048 allocs/op
We see that some benchmarks are ~almost 3x~ about 25% slower in the latest. With more changes planned like this https://github.com/open-telemetry/opentelemetry-proto/pull/283 it is going to become slower.
The composition of the messages used for benchmarking can be seen here https://github.com/tigrannajaryan/exp-otelproto/blob/7287c246fa86b4996cbf7a2b27f93e78b92a126f/encodings/baseline/generator.go#L332 and this is the source code for the benchmarks: https://github.com/tigrannajaryan/exp-otelproto/blob/7287c246fa86b4996cbf7a2b27f93e78b92a126f/encodings/encoding_test.go#L131
We need to go over the messages and see what can be improved before we declare the metric proto stable.
I will see if I can find time to do it myself once we are done with all the planned changes. Otherwise it would be great for someone else to have a look, ideally keep benchmarking continuously as changes are introduced to avoid degrading performance too much.
Discussion from data model SiG:
Previously we had acknowledged this as an issue and it was asserted this could be fixed in a Go implementation. Additionally we had a C# implementation that did not show this slowdown.
We see this as follow on work:
- Clarify what our performance targets we WANT in OTLP so we understand/evaluate changes.
- One proposal: Compare against other popular metric formats to determine relative performance.
- Another comment: We should be as optimal as possible without sacrificing usability.
- Clarify the gap in performance
- Duplicate Tigran's benchmark in non-go language to determine if the slowdown is due to Go proto implementation
- Identify whether we think we can recoup performance degradation in Go
FYI: I'm not seeing any major performance differences between v0.4.0 and v0.8.0 of the OTLP protocol in C#. See open-telemetry/opentelemetry-dotnet#1953 for test results plus PR for benchmark tests.
Quick Note. I looked deeper into the go benchmark tests, and the current benchmark is using depreciated IntGauge, which uses IntDataPoint, which DOES NOT have the oneof value { as_int, as_double }. So, the extra time is coming from somewhere else!
Quick Note. I looked deeper into the go benchmark tests, and the current benchmark is using depreciated IntGauge, which uses IntDataPoint, which DOES NOT have the
oneof value { as_int, as_double }. So, the extra time is coming from somewhere else!
Thanks @victlu for looking. I don't have time to check the Go benchmark code itself right now, but it may well be the case that something is wrong in how the measurements are done. If anyone is able to look further that's great, otherwise I will see what I can do when I am done with my current task.
One HUGE difference is the OTLP_0.4 version DOES NOT set labels per point! If I comment out labels for OTLP_HEAD, I get the following results. So, the encoding difference is now only 25% more.
BenchmarkEncode/OTLP_0.4/Metric/Int64-8 7454 811536 ns/op 69456 B/op 3446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8 6645 1018137 ns/op 77456 B/op 3946 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Int64-8 15786 387542 ns/op 94273 B/op 2046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8 13786 416626 ns/op 103073 B/op 2146 allocs/op
Disclaimer: This is first time I'm doing Go stuff, so I'm prone to misunderstanding everything!
@victlu good catch, thank you! I added the labels to OTLP_0.4, since it is more representative to have some labels and re-run the benchmark. It is good to see the degradation for int64 is much smaller and for histograms the HEAD is actually faster than 0.4:
BenchmarkEncode/OTLP_0.4/Metric/Int64-8 3368 1767491 ns/op 183888 B/op 9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8 2877 2088807 ns/op 207888 B/op 10946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Summary-8 9207 657781 ns/op 62800 B/op 3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8 9018 653362 ns/op 57360 B/op 2946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Histogram-8 6349 920041 ns/op 86288 B/op 4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8 7162 828377 ns/op 74704 B/op 3646 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8 1630 3739928 ns/op 352273 B/op 18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8 1880 3183887 ns/op 296657 B/op 14446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Mix-8 1812 3314394 ns/op 331665 B/op 17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8 1651 3906334 ns/op 336465 B/op 17446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8 415 14762737 ns/op 1400724 B/op 74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8 360 16274769 ns/op 1441106 B/op 76246 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Int64-8 9387 643925 ns/op 201489 B/op 7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8 8712 677251 ns/op 210289 B/op 7146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Summary-8 26812 227705 ns/op 75888 B/op 2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8 28062 209540 ns/op 63072 B/op 2146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Histogram-8 20109 307286 ns/op 99872 B/op 2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8 21282 268512 ns/op 87872 B/op 2846 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8 4790 1249235 ns/op 367076 B/op 12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8 5763 1009533 ns/op 335877 B/op 11146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Mix-8 4977 1157008 ns/op 377235 B/op 12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8 4776 1158076 ns/op 361218 B/op 12048 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8 1231 4880358 ns/op 1484439 B/op 51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8 1333 4635400 ns/op 1484425 B/op 50048 allocs/op
If you notice any other discrepancies do let me know.
Long-term it would be great to have the benchmarks in opentelemetry-go repo, so that we don't need to rely on my fragile personal testbed.
I will update this issue description to reflect that 3x slowdown was a false alarm, although it is still desirable to run the benchmarks before declaring the protocol stable.
Maybe we can come up with something where the benchmark is language neutral so I can use your bench directly in C# as well!
Yes, it would be nice to benchmark the same scenario and same data composition in all languages. At least the generator can be completely language independent. We can create a generator tool that creates the messages, serializes them to bytes (maybe store in files), then feed the serialized form to language-specific benchmarks that measure decoding/encoding.
I ran comparison between IntGauge (to be deprecated) vs. Gauge vs. alternative to oneof.
Encoding
- Overall 11% slower between proto 0.4 and 0.8.
- From IntGauge to Gauge using
OneOfvs Alternative- 14% slower with
oneof - 6% slower with alternative
- 14% slower with
Decoding
- Overall 5% slower between proto 0.4 and 0.8.
- From IntGauge to Gauge using
OneOfvs Alternative- 15.7% slower with
oneof
- 15.7% slower with
Alternative to oneof
Use num_type to specify mutual exclusivity between num_double and num_int. This pattern is supported on all platform.
enum NumberEnum {
NUMBER_UNKNOWN = 0;
NUMBER_DOUBLE = 1;
NUMBER_INT = 2;
}
message NumberDataPoint {
// ...
NumberEnum num_type = 6;
double num_double = 7;
sfixed64 num_int = 8;
}
Results
OTLP_0.4 = Proto 0.4 OTLP_HEAD = Proto 0.8 using IntGauge (that is to be deprecated) OTLP_ONEOF = Proto 0.8 OTLP_ENUMTYPE = Proto 0.8 using the alternative outlined above
BenchmarkEncode/OTLP_0.4/Metric/Int64-8 2640 2393619 ns/op 183889 B/op 9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8 1994 2691160 ns/op 207890 B/op 10946 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Int64-8 2023 3130439 ns/op 216082 B/op 10946 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Int64-8 2061 2883361 ns/op 207890 B/op 10946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Summary-8 7050 893153 ns/op 62800 B/op 3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8 6418 941001 ns/op 57360 B/op 2946 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Summary-8 7374 942849 ns/op 57360 B/op 2946 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Summary-8 7066 933354 ns/op 57360 B/op 2946 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Histogram-8 4836 1239377 ns/op 86288 B/op 4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8 5192 1166733 ns/op 74704 B/op 3646 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Histogram-8 4526 1159576 ns/op 74704 B/op 3646 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Histogram-8 5032 1162089 ns/op 74704 B/op 3646 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8 1248 4933819 ns/op 352276 B/op 18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8 1495 4238791 ns/op 296660 B/op 14446 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/HistogramSeries-8 1526 4188194 ns/op 296659 B/op 14446 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/HistogramSeries-8 1422 4205836 ns/op 296659 B/op 14446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/Mix-8 1387 4427703 ns/op 331666 B/op 17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8 1191 4914993 ns/op 336467 B/op 17446 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Mix-8 1182 5208637 ns/op 344659 B/op 17446 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Mix-8 1278 4905871 ns/op 336467 B/op 17446 allocs/op
BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8 328 17831027 ns/op 1400729 B/op 74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8 314 19079056 ns/op 1441115 B/op 76246 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/MixSeries-8 298 20027797 ns/op 1449307 B/op 76246 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/MixSeries-8 315 19376281 ns/op 1449307 B/op 76246 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Int64-8 7446 969204 ns/op 201491 B/op 7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8 5598 1021838 ns/op 210291 B/op 7146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Int64-8 5410 1213289 ns/op 211877 B/op 7646 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Int64-8 5298 1007639 ns/op 218291 B/op 7146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Summary-8 15856 360051 ns/op 75889 B/op 2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8 16406 384796 ns/op 63072 B/op 2146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Summary-8 15702 388343 ns/op 63072 B/op 2146 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Summary-8 16026 373828 ns/op 63072 B/op 2146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Histogram-8 14041 437210 ns/op 99873 B/op 2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8 13273 455023 ns/op 87872 B/op 2846 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Histogram-8 14428 445726 ns/op 87872 B/op 2846 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Histogram-8 14431 463288 ns/op 87873 B/op 2846 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8 3122 1760839 ns/op 367079 B/op 12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8 3453 1597534 ns/op 335879 B/op 11146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/HistogramSeries-8 3976 1550361 ns/op 335879 B/op 11146 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/HistogramSeries-8 4100 1535395 ns/op 335879 B/op 11146 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/Mix-8 3564 1708633 ns/op 377237 B/op 12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8 3022 1779530 ns/op 361220 B/op 12048 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Mix-8 2941 1917367 ns/op 362822 B/op 12548 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Mix-8 3722 1796725 ns/op 369221 B/op 12048 allocs/op
BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8 832 7022575 ns/op 1484446 B/op 51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8 834 6873600 ns/op 1484433 B/op 50048 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/MixSeries-8 788 7831306 ns/op 1486045 B/op 52548 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/MixSeries-8 888 6897834 ns/op 1524433 B/op 50048 allocs/op
A few things that I have found affect performance in Go:
- Oneof is slow since it is implemented as an interface, which is an additional indirection and allocation. Alternate to oneof is the message with the fields (and optionally an enum if needed). If there is a oneof inside a oneof maybe rearrange the messages to only have oneof.
- Number of allocations needed due to the number of separate messages (see if a message may be eliminated, especially deep in the message tree).
- The memory usage is impacted by the order of the fields due to alignment rules. Reorder the fields in a message may result in less memory usage.
- maps are pretty slow. KV lists are significantly faster (and this is the reason we did this for attributes).
bytesthat are of fixed size (e.g. 8 or 16 bytes) can be faster if stored as fixed64 (or a pair of) since it no longer requires an additional allocation. (e.g. this is how Jaeger stores traceid/spanid).- (This one I am not totally sure I remember correctly I was able to measure) I think
fixed32/54was a tiny bit faster thanint32/64(but may be larger on the wire depending on the value).
Plan for working on this bug from the Metrics DataModel SiG:
- Use Tigran's benchmarking suite to investigate Label=>Attribute Change and possible improvements
- This work will be timeboxed to April 30th deadline
- We will merge the "Label => Attribute" PR into the proto directory but NOT issue a release until this work is complete.
The work will be lead by myself and @victlu with @tigrannajaryan providing guidance (thanks for the above comment!). If anyone has time to help, feel free to comment on this bug.
So here's a few results I ran using GoGoProto (faster) across every version of OTLP from 0.4 => Now
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Int64-8 26749 43914 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Int64-8 18772 67075 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Int64-8 19136 62446 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Int64-8 18956 63186 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Int64-8 17336 70961 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Int64-8 15318 74221 ns/op
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Summary-8 69717 18255 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Summary-8 53949 21852 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Summary-8 54154 21925 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Summary-8 54010 21895 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Summary-8 52406 22943 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Summary-8 35911 32651 ns/op
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Histogram-8 48699 23925 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Histogram-8 39019 29297 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Histogram-8 38566 30174 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Histogram-8 40514 29930 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Histogram-8 36490 31997 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Histogram-8 28291 42025 ns/op
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/HistogramSeries-8 10000 101948 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/HistogramSeries-8 9901 127268 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/HistogramSeries-8 9841 114856 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/HistogramSeries-8 10000 109458 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/HistogramSeries-8 10000 114854 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/HistogramSeries-8 7179 240937 ns/op
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Mix-8 8173 137942 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Mix-8 6388 163060 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Mix-8 6651 173666 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Mix-8 5416 202425 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Mix-8 5396 188160 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Mix-8 5061 212527 ns/op
BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8 2090 608697 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8 1459 795073 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8 1659 734161 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8 1671 824701 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8 1184 1285495 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8 1050 1281293 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Int64-8 3009 348570 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Int64-8 3894 287816 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Int64-8 3750 289633 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Int64-8 3910 285599 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Int64-8 3678 293326 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Int64-8 3639 304446 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Summary-8 13567 88821 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Summary-8 15590 77919 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Summary-8 15488 78906 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Summary-8 15544 76030 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Summary-8 15144 80054 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Summary-8 12070 99301 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Histogram-8 10000 116692 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Histogram-8 11652 123209 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Histogram-8 9837 108647 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Histogram-8 9891 115249 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Histogram-8 9692 118362 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Histogram-8 7743 146608 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/HistogramSeries-8 2151 474082 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/HistogramSeries-8 2374 448891 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/HistogramSeries-8 2516 483569 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/HistogramSeries-8 2412 468562 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/HistogramSeries-8 2698 476729 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/HistogramSeries-8 1855 592413 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Mix-8 2259 547694 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Mix-8 2108 526082 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Mix-8 2148 515173 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Mix-8 2185 507508 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Mix-8 2295 522157 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Mix-8 2098 604631 ns/op
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8 453 2390820 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8 571 2131050 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8 513 2314529 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8 489 2085016 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8 568 2307425 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8 447 2636537 ns/op
Additionally, a show of the encoded byte size:
Encoding Uncompressed Improved Compressed Improved Compressed Improved
OTLP 0.4 (Gogo Faster)/Metric/Gauge 29982 bytes [1.000], zlib 1750 bytes [1.000], zstd 1861 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/Gauge 29882 bytes [1.003], zlib 1743 bytes [1.004], zstd 1858 bytes [1.002]
OTLP 0.6 (Gogo Faster)/Metric/Gauge 29882 bytes [1.003], zlib 1743 bytes [1.004], zstd 1858 bytes [1.002]
OTLP 0.7 (Gogo Faster)/Metric/Gauge 29882 bytes [1.003], zlib 1743 bytes [1.004], zstd 1858 bytes [1.002]
OTLP 0.8 (Gogo Faster)/Metric/Gauge 34382 bytes [0.872], zlib 1998 bytes [0.876], zstd 1920 bytes [0.969]
OTLP HEAD (Gogo Faster)/Metric/Gauge 34382 bytes [0.872], zlib 1998 bytes [0.876], zstd 1920 bytes [0.969]
Encoding Uncompressed Improved Compressed Improved Compressed Improved
OTLP 0.4 (Gogo Faster)/Metric/Histogram 13170 bytes [1.000], zlib 1824 bytes [1.000], zstd 1313 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/Histogram 14580 bytes [0.903], zlib 1849 bytes [0.986], zstd 1361 bytes [0.965]
OTLP 0.6 (Gogo Faster)/Metric/Histogram 14580 bytes [0.903], zlib 1849 bytes [0.986], zstd 1361 bytes [0.965]
OTLP 0.7 (Gogo Faster)/Metric/Histogram 14580 bytes [0.903], zlib 1849 bytes [0.986], zstd 1361 bytes [0.965]
OTLP 0.8 (Gogo Faster)/Metric/Histogram 15480 bytes [0.851], zlib 1882 bytes [0.969], zstd 1341 bytes [0.979]
OTLP HEAD (Gogo Faster)/Metric/Histogram 15880 bytes [0.829], zlib 1861 bytes [0.980], zstd 1373 bytes [0.956]
Encoding Uncompressed Improved Compressed Improved Compressed Improved
OTLP 0.4 (Gogo Faster)/Metric/MixOne 53332 bytes [1.000], zlib 3921 bytes [1.000], zstd 3876 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/MixOne 53142 bytes [1.004], zlib 3894 bytes [1.007], zstd 3897 bytes [0.995]
OTLP 0.6 (Gogo Faster)/Metric/MixOne 53142 bytes [1.004], zlib 3894 bytes [1.007], zstd 3897 bytes [0.995]
OTLP 0.7 (Gogo Faster)/Metric/MixOne 53142 bytes [1.004], zlib 3894 bytes [1.007], zstd 3897 bytes [0.995]
OTLP 0.8 (Gogo Faster)/Metric/MixOne 59442 bytes [0.897], zlib 4232 bytes [0.927], zstd 3927 bytes [0.987]
OTLP HEAD (Gogo Faster)/Metric/MixOne 60242 bytes [0.885], zlib 4251 bytes [0.922], zstd 4013 bytes [0.966]
Encoding Uncompressed Improved Compressed Improved Compressed Improved
OTLP 0.4 (Gogo Faster)/Metric/MixSeries 199279 bytes [1.000], zlib 12513 bytes [1.000], zstd 14904 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.6 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.7 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.8 (Gogo Faster)/Metric/MixSeries 227742 bytes [0.875], zlib 13412 bytes [0.933], zstd 15063 bytes [0.989]
OTLP HEAD (Gogo Faster)/Metric/MixSeries 231742 bytes [0.860], zlib 13548 bytes [0.924], zstd 14690 bytes [1.015]
Obvious points
- Every nested message added is increasing serialized size
- oneof usage has been slowly degrading (go) performance since 0.4
- 0.4 => 0.5 sees a major hit to performance from oneofs for metric type vs. enum/descriptor
- 0.8 => present sees another hit to encoding performance due to Attributes vs. Labels.
- When viewing the "MixSeries-8" benchmark, relative performance differential is not as noticable.
@victlu confirmed similar results in Go.
Next AIs
- @victlu is going to check oneof performance in C# to see if the drops in performance are as severe
- We're going to investigate using optional fields + a single enum vs. oneof in Go to determine if this improves performance significantly
- I'm going to try to do some isolated changes around larger performance dips to focus on root-cause (oneof, added nested messages or other).
We're going to investigate using optional fields + a single enum vs. oneof in Go to determine if this improves performance significantly
IIRC: usually for a very small number of options in oneof a struct+enum is both faster and smaller in memory. The more options you add the larger memory usage in struct+enum becomes. At some point it starts using more memory than oneof and for even larger number of options if I remember correctly it starts slowing down due to too much memory used. TLDR: you will need to measure to see if struct+enum is faster and/or smaller than oneof.
Hint: use -benchmem to see memory usage (but be aware that is total for everything the benchmark does, so will need interpret carefully).
Here's a look at "no oneof" in the entire protocol. There is a decent memory usage bump for a performance gain:
BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8 464 3364230 ns/op 853510 B/op 38137 allocs/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8 454 2783817 ns/op 915910 B/op 36437 allocs/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8 483 2425550 ns/op 915908 B/op 36437 allocs/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8 547 2459825 ns/op 915910 B/op 36437 allocs/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8 457 2591545 ns/op 920704 B/op 39937 allocs/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8 423 2716310 ns/op 1111123 B/op 43937 allocs/op
BenchmarkDecode/OTLP_HEAD_No_oneof_(Gogo_Faster)/Metric/MixSeries-8 483 2438340 ns/op 1119909 B/op 40137 allocs/op
Thanks for the -benchmem tip. Still new to Go.
I ran .NET C# benchmarked across all our OTLP proto versions. This is an extensive run (~4 hours) so jitter artifacts should not be an issue.
My takeaway from this test comparing v0.4.0 with v0.8.0:
- Computation wise, overall we are slower by 6% difference. The outlier is DecodingGauge which is slower by 16% difference.
- Memory Allocation wise, we are better or same across all operations. In some cases we are better by 20%.
Results
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
[Host] : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
Job-UQXYHB : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
IterationCount=50 LaunchCount=10 WarmupCount=10
| Method | Version | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------- |-------- |----------:|---------:|----------:|----------:|--------:|--------:|------:|----------:|
| EncodeGauge | 0.4.0 | 90.38 us | 0.743 us | 4.922 us | 88.93 us | 20.9961 | 0.1221 | - | 85.98 KB |
| DecodeGauge | 0.4.0 | 80.52 us | 0.359 us | 2.367 us | 79.65 us | 30.2734 | 0.6104 | - | 124.03 KB |
| EncodeSummary | 0.4.0 | 120.16 us | 0.796 us | 5.251 us | 118.80 us | 29.0527 | 6.3477 | - | 118.99 KB |
| DecodeSummary | 0.4.0 | 224.80 us | 2.788 us | 18.662 us | 217.48 us | 49.0723 | 1.2207 | - | 200.59 KB |
| EncodeHistogram | 0.4.0 | 166.26 us | 1.102 us | 7.261 us | 163.73 us | 41.0156 | 13.6719 | - | 168.02 KB |
| DecodeHistogram | 0.4.0 | 277.14 us | 1.827 us | 11.845 us | 272.85 us | 74.2188 | 23.4375 | - | 309.4 KB |
| EncodeGauge | 0.5.0 | 93.26 us | 0.512 us | 3.351 us | 92.47 us | 19.1650 | 0.3662 | - | 78.66 KB |
| DecodeGauge | 0.5.0 | 81.69 us | 0.391 us | 2.530 us | 80.84 us | 28.4424 | - | - | 116.22 KB |
| EncodeSummary | 0.5.0 | 95.05 us | 0.549 us | 3.596 us | 94.34 us | 19.1650 | 0.1221 | - | 78.56 KB |
| DecodeSummary | 0.5.0 | 90.39 us | 0.428 us | 2.757 us | 89.52 us | 30.5176 | 0.4883 | - | 124.81 KB |
| EncodeHistogram | 0.5.0 | 154.30 us | 0.760 us | 4.806 us | 153.25 us | 36.1328 | 0.2441 | - | 147.7 KB |
| DecodeHistogram | 0.5.0 | 260.91 us | 6.413 us | 41.713 us | 245.25 us | 60.5469 | 20.0195 | - | 257.84 KB |
| EncodeGauge | 0.6.0 | 92.14 us | 0.918 us | 5.845 us | 90.29 us | 16.3574 | 0.1221 | - | 67.06 KB |
| DecodeGauge | 0.6.0 | 100.51 us | 1.193 us | 7.711 us | 97.63 us | 29.1748 | 0.2441 | - | 119.34 KB |
| EncodeSummary | 0.6.0 | 89.41 us | 0.709 us | 4.612 us | 87.85 us | 16.6016 | - | - | 67.84 KB |
| DecodeSummary | 0.6.0 | 91.23 us | 0.387 us | 2.566 us | 90.50 us | 28.4424 | - | - | 116.22 KB |
| EncodeHistogram | 0.6.0 | 166.32 us | 1.428 us | 9.412 us | 163.94 us | 36.8652 | 9.2773 | - | 151.83 KB |
| DecodeHistogram | 0.6.0 | 279.38 us | 2.231 us | 14.414 us | 275.05 us | 70.8008 | 22.9492 | - | 304.71 KB |
| EncodeGauge | 0.7.0 | 97.39 us | 1.619 us | 10.779 us | 94.38 us | 16.3574 | 0.1221 | - | 67.06 KB |
| DecodeGauge | 0.7.0 | 99.56 us | 0.987 us | 6.559 us | 97.21 us | 29.0527 | 0.4883 | - | 119.34 KB |
| EncodeSummary | 0.7.0 | 122.87 us | 0.933 us | 6.127 us | 120.65 us | 23.6816 | 4.6387 | - | 97.24 KB |
| DecodeSummary | 0.7.0 | 235.73 us | 2.654 us | 17.616 us | 229.94 us | 46.3867 | 3.9063 | - | 189.66 KB |
| EncodeHistogram | 0.7.0 | 165.69 us | 1.245 us | 8.091 us | 162.92 us | 36.8652 | 9.2773 | - | 151.83 KB |
| DecodeHistogram | 0.7.0 | 273.84 us | 1.980 us | 12.905 us | 269.18 us | 68.8477 | 21.9727 | - | 304.71 KB |
| EncodeGauge | 0.8.0 | 92.84 us | 0.687 us | 4.558 us | 91.60 us | 17.0898 | 3.4180 | - | 70.19 KB |
| DecodeGauge | 0.8.0 | 94.21 us | 0.707 us | 4.688 us | 92.42 us | 29.7852 | 0.2441 | - | 121.69 KB |
| EncodeSummary | 0.8.0 | 127.31 us | 2.397 us | 15.724 us | 121.89 us | 23.8037 | 2.8076 | - | 97.24 KB |
| DecodeSummary | 0.8.0 | 238.60 us | 2.169 us | 14.517 us | 235.57 us | 46.3867 | 4.8828 | - | 189.66 KB |
| EncodeHistogram | 0.8.0 | 176.01 us | 1.245 us | 8.271 us | 173.24 us | 41.0156 | 2.6855 | - | 167.84 KB |
| DecodeHistogram | 0.8.0 | 290.50 us | 2.308 us | 15.303 us | 285.33 us | 68.3594 | 19.5313 | - | 317.21 KB |
Analysis
Time/Mem Diff column is comparing % difference against base of 0.4.0 version.
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| DecodeGauge | 0.4.0 | 80.52 | 124.03 | 0% | 0% |
| DecodeGauge | 0.5.0 | 81.69 | 116.22 | 1% | 7% |
| DecodeGauge | 0.6.0 | 100.51 | 119.34 | 22% | 4% |
| DecodeGauge | 0.7.0 | 99.56 | 119.34 | 21% | 4% |
| DecodeGauge | 0.8.0 | 94.21 | 121.69 | 16% | 2% |
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| EncodeGauge | 0.4.0 | 90.38 | 85.98 | 0% | 0% |
| EncodeGauge | 0.5.0 | 93.26 | 78.66 | 3% | 9% |
| EncodeGauge | 0.6.0 | 92.14 | 67.06 | 2% | 25% |
| EncodeGauge | 0.7.0 | 97.39 | 67.06 | 7% | 25% |
| EncodeGauge | 0.8.0 | 92.84 | 70.19 | 3% | 20% |
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| DecodeSummary | 0.4.0 | 224.8 | 200.59 | 0% | 0% |
| DecodeSummary | 0.5.0 | 90.39 | 124.81 | 85% | 47% |
| DecodeSummary | 0.6.0 | 91.23 | 116.22 | 85% | 53% |
| DecodeSummary | 0.7.0 | 235.73 | 189.66 | 5% | 6% |
| DecodeSummary | 0.8.0 | 238.6 | 189.66 | 6% | 6% |
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| EncodeSummary | 0.4.0 | 120.16 | 118.99 | 0% | 0% |
| EncodeSummary | 0.5.0 | 95.05 | 78.56 | 23% | 41% |
| EncodeSummary | 0.6.0 | 89.41 | 67.84 | 29% | 55% |
| EncodeSummary | 0.7.0 | 122.87 | 97.24 | 2% | 20% |
| EncodeSummary | 0.8.0 | 127.31 | 97.24 | 6% | 20% |
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| DecodeHistogram | 0.4.0 | 277.14 | 309.4 | 0% | 0% |
| DecodeHistogram | 0.5.0 | 260.91 | 257.84 | 6% | 18% |
| DecodeHistogram | 0.6.0 | 279.38 | 304.71 | 1% | 2% |
| DecodeHistogram | 0.7.0 | 273.84 | 304.71 | 1% | 2% |
| DecodeHistogram | 0.8.0 | 290.5 | 317.21 | 5% | 2% |
| Method | Version | Mean (us) | Allocated (KB) | Time Diff | Mem Diff |
|---|---|---|---|---|---|
| EncodeHistogram | 0.4.0 | 166.26 | 168.02 | 0% | 0% |
| EncodeHistogram | 0.5.0 | 154.3 | 147.7 | 7% | 13% |
| EncodeHistogram | 0.6.0 | 166.32 | 151.83 | 0% | 10% |
| EncodeHistogram | 0.7.0 | 165.69 | 151.83 | 0% | 10% |
| EncodeHistogram | 0.8.0 | 176.01 | 167.84 | 6% | 0% |
So I compared two experiments for ideas on what happened to performance to try to better isolate the cost of changes.
The first experiment was removing all "oneof" instances within the protocol and instead hanging two fields. This should be binary compatible. E.g. here's the change to metric:
Experiment 1: No oneofs
message Metric {
// name of the metric, including its DNS name prefix. It must be unique.
string name = 1;
// description of the metric, which can be used in documentation.
string description = 2;
// unit in which the metric value is reported. Follows the format
// described by http://unitsofmeasure.org/ucum.html.
string unit = 3;
Gauge gauge = 5;
Sum sum = 7;
Histogram histogram = 9;
Summary summary = 11;
}
Experiment 2: Flatten Metrics
The second experiment attempts to revert behavior much closer to the 0.4 behavior by removing the early bundling of metrics by type and using a TYPE enum on metric;
message Metric {
// name of the metric, including its DNS name prefix. It must be unique.
string name = 1;
// description of the metric, which can be used in documentation.
string description = 2;
// unit in which the metric value is reported. Follows the format
// described by http://unitsofmeasure.org/ucum.html.
string unit = 3;
// Type is the type of values a metric has.
enum Type {
// INVALID_TYPE is the default Type, it MUST not be used.
INVALID_TYPE = 0;
// TODO: doc
GAUGE = 1;
// TODO: doc
SUM = 2;
// Histogram measurement.
// Corresponding values are stored in HistogramDataPoint.
HISTOGRAM = 3;
// Summary value. Some frameworks implemented Histograms as a summary of observations
// (usually things like request durations and response sizes). While it
// also provides a total count of observations and a sum of all observed
// values, it calculates configurable percentiles over a sliding time
// window.
// Corresponding values are stored in SummaryDataPoint.
SUMMARY = 4;
}
// type is the type of values this metric has.
Type type = 4;
// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
// Only used in Sum/Histogram metrics.
AggregationTemporality aggregation_temporality = 5;
// If "true" means that the sum/gauge is monotonic.
// Only valid for Sums + Gauges.
bool is_monotonic_sum = 6;
// Only exists if Gauge metric
repeated NumberDataPoint sum_or_gauge_data_points = 10;
// Only exists if Sum metric
//repeated NumberDataPoint sum_data_points = 11;
// Only exists if histogram metric
repeated HistogramDataPoint histogram_data_points = 12;
// Only exists if Summary metric
repeated SummaryDataPoint summary_data_points = 13;
}
You can find the code (based on @tigrannajaryan's benchmarks) here.
Results
"MixSeries-8" benchmark form linked code. Relative % are from 0.4 baseline.
| Method | Version | Ns / Operation | Bytes / Operation | Allocations |
|---|---|---|---|---|
| Encode | 0.4 | 513374 | 204816 | 2 |
| Encode | 0.8 | 736628 (+43%) | 229392 (+12%) | 2 |
| Encode | HEAD | 953726 (+85%) | 237584 (+16%) | 2 |
| Encode | Exp1: No oneof | 803390 (+56%) | 229392 (+12%) | 2 |
| Encode | Exp2: flat | 601148 (+17%) | 237584 (+16%) | 2 |
| Decode | 0.4 | 2045311 | 853507 | 38137 |
| Decode | 0.8 | 2036024 (99%) | 920704 (107%) | 39937 (104%) |
| Decode | HEAD | 2271604 (111%) | 1111122 (130%) | 43937 (115%) |
| Decode | Exp1: No oneof | 2793006 (136%) | 1279907 (149%) | 50137 (131%) |
| Decode | Exp2: flat | 2241217 (108%) | 1123125 (131%) | 43337 (113%) |
Notes:
0.4was baseline of previous metrics. Between 0.4 => 0.8 theMetricmessage was created which has a oneof forHistogram,Gauage,Sumetc.HEADrepresents the currentmainbranch of the proto repository. The major change here was moving fromString=>Stringlabels to typed Attributes.- Current metric mix is 8 series per metric, with two labels per metric type. For a future task going to attempt to make this metric mix be a bit more realistic with what we'd expect. For now, shifts in metric representation are OVER emphasized.
Thoughts
- Label => Attribute shift causes ~ 10% allocation + performance penalty. Given the reasons behind this change and attribute prevalence in the protocol, I think we can re-evaluaute attribute performance across the entire protocol at some point. For now, metrics should "look similar" to other signals.
- Unwinding the
Metrictype to use an enum and repeated data points with optional fields DOES lead to a (~20%) performance gain over the current encoding. We'll need to discuss this in the Metric Data Model SiG. It's possible there are changes like this which lead to better performance at less of a cognitive cost. Note: The decision to use the current sub-metric message modelling occurred in 0.6. The pull request included performance benchmark that makes me want to investigate differences between that benchamrk and this one, where we do see a slight performance hit w/ 0.6.
Future work
- Improve generation of sampled data
- Ensure percentiles in Summary are always generated
- Add configurable + realistic number of tags/attributes to metrics
- Ensure both Gauge + Sum (with both Double + Int) values are represented.
- Clean up the benchmarks to a repeatable + "check-in" friendly state.
- Discuss future options in DataModel SiG
From @bogdandrutu: use
s+repeated opentelemetry.proto.common.v1.KeyValue \(.*\);+repeated opentelemetry.proto.common.v1.KeyValue \1\
[ (gogoproto.nullable) = false ];+g
for key-values.
Here's a capture of the discussion from the DataModel SiG:
- Given @victlu's benchmarks in C#, we think we'll similar performance in non-Go language.
- @bogdandrutu had a set of things to try regarding Go-specifically.
- The consensus here was that given the experimental results, it's likely that an alternative approach to Go implementation would lead to similar benefits over changing the proto structure. Specifically, removing oneofs or using enum + repeated fields are likely things that can be done in a Go-specific way, and saw little benefit in C#.
- We're downgrading this work from release-blocking, and we'll continue to push on performance ideas and improvements for Go.
- Specifically, removing oneofs or using enum + repeated fields are likely things that can be done in a Go-specific way, and saw little benefit in C#.
I am not aware of an easy way to do this in Go. oneof Protobuf generators use interfaces which are slow in Go. The non-easy (but doable) way to make oneof fast is a custom data type (using Gogoproto, not possible with Goproto), which implements a fast Variant data type like this: https://github.com/tigrannajaryan/govariant
@bogdandrutu lead the discussion on fixing Go. I assume he was thinking Gogoproto + fast variant, but I let him comment here. I'm definitely convinced Bogdan can fix this :)
Note OTel-Go has a variant type for its Attribute value. Surely this can be fixed in Go with a custom solution.
@jmacd the problem is that Goproto does not have a way to use custom data structures for messages. At least I am not aware of a way (unless you modify the generated code). Gogoproto supports custom data structures and we use it for some messages already in the Collector for performance reasons (not for attribute values yet, but will possibly do in the future).
BTW, this is likely a faster and more compact variant than what Otel-Go uses: https://github.com/tigrannajaryan/govariant :-)
To more clearly delineate the decision documented in: https://github.com/open-telemetry/opentelemetry-proto/issues/287#issuecomment-827787057
This issue was downgraded from release blocking. We think the performance of the protos is acceptable in most languages (where alternatives don't provide enough benefit), and we think Go's incarnation of performance can be fixed.
The todos are:
- [ ] Determine go-specific performance improvements that do not change the protocol structure (just in-go).
- [ ] Create an on-going performance suite to measure performance @ each release of OTLP and prevent major (unknown) regressions. (I've been working on taking this, I expect this to be a multi-month effort)
I can split this out into separate bugs if it helps.
Metrics are now Stable and we are not going to change in a breaking way so this is likely non-actionable. Closing.