opentelemetry-proto icon indicating copy to clipboard operation
opentelemetry-proto copied to clipboard

Benchmark and speed up metric proto while maintaining semantics

Open tigrannajaryan opened this issue 4 years ago • 24 comments

Recent changes to metric messages resulted in ~significant~ some speed reduction compared to earlier versions. Here is comparision between 0.4 and current latest:

BenchmarkEncode/OTLP_0.4/Metric/Int64-8                     3368           1767491 ns/op          183888 B/op       9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8                    2877           2088807 ns/op          207888 B/op      10946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Summary-8                   9207            657781 ns/op           62800 B/op       3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8                  9018            653362 ns/op           57360 B/op       2946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Histogram-8                 6349            920041 ns/op           86288 B/op       4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8                7162            828377 ns/op           74704 B/op       3646 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8                   1630           3739928 ns/op          352273 B/op      18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8                  1880           3183887 ns/op          296657 B/op      14446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Mix-8                               1812           3314394 ns/op          331665 B/op      17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8                              1651           3906334 ns/op          336465 B/op      17446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8                          415          14762737 ns/op         1400724 B/op      74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8                         360          16274769 ns/op         1441106 B/op      76246 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Int64-8                             9387            643925 ns/op          201489 B/op       7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8                            8712            677251 ns/op          210289 B/op       7146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Summary-8                          26812            227705 ns/op           75888 B/op       2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8                         28062            209540 ns/op           63072 B/op       2146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Histogram-8                        20109            307286 ns/op           99872 B/op       2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8                       21282            268512 ns/op           87872 B/op       2846 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8                   4790           1249235 ns/op          367076 B/op      12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8                  5763           1009533 ns/op          335877 B/op      11146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Mix-8                               4977           1157008 ns/op          377235 B/op      12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8                              4776           1158076 ns/op          361218 B/op      12048 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8                         1231           4880358 ns/op         1484439 B/op      51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8                        1333           4635400 ns/op         1484425 B/op      50048 allocs/op

We see that some benchmarks are ~almost 3x~ about 25% slower in the latest. With more changes planned like this https://github.com/open-telemetry/opentelemetry-proto/pull/283 it is going to become slower.

The composition of the messages used for benchmarking can be seen here https://github.com/tigrannajaryan/exp-otelproto/blob/7287c246fa86b4996cbf7a2b27f93e78b92a126f/encodings/baseline/generator.go#L332 and this is the source code for the benchmarks: https://github.com/tigrannajaryan/exp-otelproto/blob/7287c246fa86b4996cbf7a2b27f93e78b92a126f/encodings/encoding_test.go#L131

We need to go over the messages and see what can be improved before we declare the metric proto stable.

I will see if I can find time to do it myself once we are done with all the planned changes. Otherwise it would be great for someone else to have a look, ideally keep benchmarking continuously as changes are introduced to avoid degrading performance too much.

tigrannajaryan avatar Mar 29 '21 14:03 tigrannajaryan

Discussion from data model SiG:

Previously we had acknowledged this as an issue and it was asserted this could be fixed in a Go implementation. Additionally we had a C# implementation that did not show this slowdown.

We see this as follow on work:

  • Clarify what our performance targets we WANT in OTLP so we understand/evaluate changes.
    • One proposal: Compare against other popular metric formats to determine relative performance.
    • Another comment: We should be as optimal as possible without sacrificing usability.
  • Clarify the gap in performance
    • Duplicate Tigran's benchmark in non-go language to determine if the slowdown is due to Go proto implementation
    • Identify whether we think we can recoup performance degradation in Go

jsuereth avatar Mar 30 '21 18:03 jsuereth

FYI: I'm not seeing any major performance differences between v0.4.0 and v0.8.0 of the OTLP protocol in C#. See open-telemetry/opentelemetry-dotnet#1953 for test results plus PR for benchmark tests.

victlu avatar Apr 01 '21 01:04 victlu

Quick Note. I looked deeper into the go benchmark tests, and the current benchmark is using depreciated IntGauge, which uses IntDataPoint, which DOES NOT have the oneof value { as_int, as_double }. So, the extra time is coming from somewhere else!

victlu avatar Apr 06 '21 21:04 victlu

Quick Note. I looked deeper into the go benchmark tests, and the current benchmark is using depreciated IntGauge, which uses IntDataPoint, which DOES NOT have the oneof value { as_int, as_double }. So, the extra time is coming from somewhere else!

Thanks @victlu for looking. I don't have time to check the Go benchmark code itself right now, but it may well be the case that something is wrong in how the measurements are done. If anyone is able to look further that's great, otherwise I will see what I can do when I am done with my current task.

tigrannajaryan avatar Apr 06 '21 21:04 tigrannajaryan

One HUGE difference is the OTLP_0.4 version DOES NOT set labels per point! If I comment out labels for OTLP_HEAD, I get the following results. So, the encoding difference is now only 25% more.

BenchmarkEncode/OTLP_0.4/Metric/Int64-8                     7454            811536 ns/op           69456 B/op       3446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8                    6645           1018137 ns/op           77456 B/op       3946 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Int64-8                    15786            387542 ns/op           94273 B/op       2046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8                   13786            416626 ns/op          103073 B/op       2146 allocs/op

Disclaimer: This is first time I'm doing Go stuff, so I'm prone to misunderstanding everything!

victlu avatar Apr 06 '21 22:04 victlu

@victlu good catch, thank you! I added the labels to OTLP_0.4, since it is more representative to have some labels and re-run the benchmark. It is good to see the degradation for int64 is much smaller and for histograms the HEAD is actually faster than 0.4:

BenchmarkEncode/OTLP_0.4/Metric/Int64-8                     3368           1767491 ns/op          183888 B/op       9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8                    2877           2088807 ns/op          207888 B/op      10946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Summary-8                   9207            657781 ns/op           62800 B/op       3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8                  9018            653362 ns/op           57360 B/op       2946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Histogram-8                 6349            920041 ns/op           86288 B/op       4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8                7162            828377 ns/op           74704 B/op       3646 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8                   1630           3739928 ns/op          352273 B/op      18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8                  1880           3183887 ns/op          296657 B/op      14446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Mix-8                               1812           3314394 ns/op          331665 B/op      17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8                              1651           3906334 ns/op          336465 B/op      17446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8                          415          14762737 ns/op         1400724 B/op      74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8                         360          16274769 ns/op         1441106 B/op      76246 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Int64-8                             9387            643925 ns/op          201489 B/op       7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8                            8712            677251 ns/op          210289 B/op       7146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Summary-8                          26812            227705 ns/op           75888 B/op       2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8                         28062            209540 ns/op           63072 B/op       2146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Histogram-8                        20109            307286 ns/op           99872 B/op       2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8                       21282            268512 ns/op           87872 B/op       2846 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8                   4790           1249235 ns/op          367076 B/op      12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8                  5763           1009533 ns/op          335877 B/op      11146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Mix-8                               4977           1157008 ns/op          377235 B/op      12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8                              4776           1158076 ns/op          361218 B/op      12048 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8                         1231           4880358 ns/op         1484439 B/op      51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8                        1333           4635400 ns/op         1484425 B/op      50048 allocs/op

If you notice any other discrepancies do let me know.

Long-term it would be great to have the benchmarks in opentelemetry-go repo, so that we don't need to rely on my fragile personal testbed.

I will update this issue description to reflect that 3x slowdown was a false alarm, although it is still desirable to run the benchmarks before declaring the protocol stable.

tigrannajaryan avatar Apr 06 '21 22:04 tigrannajaryan

Maybe we can come up with something where the benchmark is language neutral so I can use your bench directly in C# as well!

victlu avatar Apr 06 '21 22:04 victlu

Yes, it would be nice to benchmark the same scenario and same data composition in all languages. At least the generator can be completely language independent. We can create a generator tool that creates the messages, serializes them to bytes (maybe store in files), then feed the serialized form to language-specific benchmarks that measure decoding/encoding.

tigrannajaryan avatar Apr 06 '21 22:04 tigrannajaryan

I ran comparison between IntGauge (to be deprecated) vs. Gauge vs. alternative to oneof.

Encoding

  • Overall 11% slower between proto 0.4 and 0.8.
  • From IntGauge to Gauge using OneOf vs Alternative
    • 14% slower with oneof
    • 6% slower with alternative

Decoding

  • Overall 5% slower between proto 0.4 and 0.8.
  • From IntGauge to Gauge using OneOf vs Alternative
    • 15.7% slower with oneof

Alternative to oneof

Use num_type to specify mutual exclusivity between num_double and num_int. This pattern is supported on all platform.

enum NumberEnum {
  NUMBER_UNKNOWN = 0;
  NUMBER_DOUBLE = 1;
  NUMBER_INT = 2;
}

message NumberDataPoint {
  // ...

  NumberEnum num_type = 6;
  double num_double = 7;
  sfixed64 num_int = 8;
}

Results

OTLP_0.4 = Proto 0.4 OTLP_HEAD = Proto 0.8 using IntGauge (that is to be deprecated) OTLP_ONEOF = Proto 0.8 OTLP_ENUMTYPE = Proto 0.8 using the alternative outlined above

BenchmarkEncode/OTLP_0.4/Metric/Int64-8                             2640           2393619 ns/op          183889 B/op       9446 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Int64-8                            1994           2691160 ns/op          207890 B/op      10946 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Int64-8                           2023           3130439 ns/op          216082 B/op      10946 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Int64-8                        2061           2883361 ns/op          207890 B/op      10946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Summary-8                           7050            893153 ns/op           62800 B/op       3246 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Summary-8                          6418            941001 ns/op           57360 B/op       2946 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Summary-8                         7374            942849 ns/op           57360 B/op       2946 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Summary-8                      7066            933354 ns/op           57360 B/op       2946 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Histogram-8                         4836           1239377 ns/op           86288 B/op       4546 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Histogram-8                        5192           1166733 ns/op           74704 B/op       3646 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Histogram-8                       4526           1159576 ns/op           74704 B/op       3646 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Histogram-8                    5032           1162089 ns/op           74704 B/op       3646 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/HistogramSeries-8                   1248           4933819 ns/op          352276 B/op      18946 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/HistogramSeries-8                  1495           4238791 ns/op          296660 B/op      14446 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/HistogramSeries-8                 1526           4188194 ns/op          296659 B/op      14446 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/HistogramSeries-8              1422           4205836 ns/op          296659 B/op      14446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/Mix-8                               1387           4427703 ns/op          331666 B/op      17146 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/Mix-8                              1191           4914993 ns/op          336467 B/op      17446 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/Mix-8                             1182           5208637 ns/op          344659 B/op      17446 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/Mix-8                          1278           4905871 ns/op          336467 B/op      17446 allocs/op

BenchmarkEncode/OTLP_0.4/Metric/MixSeries-8                          328          17831027 ns/op         1400729 B/op      74746 allocs/op
BenchmarkEncode/OTLP_HEAD/Metric/MixSeries-8                         314          19079056 ns/op         1441115 B/op      76246 allocs/op
BenchmarkEncode/OTLP_ONEOF/Metric/MixSeries-8                        298          20027797 ns/op         1449307 B/op      76246 allocs/op
BenchmarkEncode/OTLP_ENUMTYPE/Metric/MixSeries-8                     315          19376281 ns/op         1449307 B/op      76246 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Int64-8                             7446            969204 ns/op          201491 B/op       7046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Int64-8                            5598           1021838 ns/op          210291 B/op       7146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Int64-8                           5410           1213289 ns/op          211877 B/op       7646 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Int64-8                        5298           1007639 ns/op          218291 B/op       7146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Summary-8                          15856            360051 ns/op           75889 B/op       2246 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Summary-8                         16406            384796 ns/op           63072 B/op       2146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Summary-8                        15702            388343 ns/op           63072 B/op       2146 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Summary-8                     16026            373828 ns/op           63072 B/op       2146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Histogram-8                        14041            437210 ns/op           99873 B/op       2946 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Histogram-8                       13273            455023 ns/op           87872 B/op       2846 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Histogram-8                      14428            445726 ns/op           87872 B/op       2846 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Histogram-8                   14431            463288 ns/op           87873 B/op       2846 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/HistogramSeries-8                   3122           1760839 ns/op          367079 B/op      12046 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/HistogramSeries-8                  3453           1597534 ns/op          335879 B/op      11146 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/HistogramSeries-8                 3976           1550361 ns/op          335879 B/op      11146 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/HistogramSeries-8              4100           1535395 ns/op          335879 B/op      11146 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/Mix-8                               3564           1708633 ns/op          377237 B/op      12148 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/Mix-8                              3022           1779530 ns/op          361220 B/op      12048 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/Mix-8                             2941           1917367 ns/op          362822 B/op      12548 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/Mix-8                          3722           1796725 ns/op          369221 B/op      12048 allocs/op

BenchmarkDecode/OTLP_0.4/Metric/MixSeries-8                          832           7022575 ns/op         1484446 B/op      51748 allocs/op
BenchmarkDecode/OTLP_HEAD/Metric/MixSeries-8                         834           6873600 ns/op         1484433 B/op      50048 allocs/op
BenchmarkDecode/OTLP_ONEOF/Metric/MixSeries-8                        788           7831306 ns/op         1486045 B/op      52548 allocs/op
BenchmarkDecode/OTLP_ENUMTYPE/Metric/MixSeries-8                     888           6897834 ns/op         1524433 B/op      50048 allocs/op

victlu avatar Apr 12 '21 23:04 victlu

A few things that I have found affect performance in Go:

  • Oneof is slow since it is implemented as an interface, which is an additional indirection and allocation. Alternate to oneof is the message with the fields (and optionally an enum if needed). If there is a oneof inside a oneof maybe rearrange the messages to only have oneof.
  • Number of allocations needed due to the number of separate messages (see if a message may be eliminated, especially deep in the message tree).
  • The memory usage is impacted by the order of the fields due to alignment rules. Reorder the fields in a message may result in less memory usage.
  • maps are pretty slow. KV lists are significantly faster (and this is the reason we did this for attributes).
  • bytes that are of fixed size (e.g. 8 or 16 bytes) can be faster if stored as fixed64 (or a pair of) since it no longer requires an additional allocation. (e.g. this is how Jaeger stores traceid/spanid).
  • (This one I am not totally sure I remember correctly I was able to measure) I think fixed32/54 was a tiny bit faster than int32/64 (but may be larger on the wire depending on the value).

tigrannajaryan avatar Apr 20 '21 16:04 tigrannajaryan

Plan for working on this bug from the Metrics DataModel SiG:

  • Use Tigran's benchmarking suite to investigate Label=>Attribute Change and possible improvements
  • This work will be timeboxed to April 30th deadline
  • We will merge the "Label => Attribute" PR into the proto directory but NOT issue a release until this work is complete.

The work will be lead by myself and @victlu with @tigrannajaryan providing guidance (thanks for the above comment!). If anyone has time to help, feel free to comment on this bug.

jsuereth avatar Apr 20 '21 17:04 jsuereth

So here's a few results I ran using GoGoProto (faster) across every version of OTLP from 0.4 => Now

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Int64-8         	   26749	     43914 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Int64-8         	   18772	     67075 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Int64-8         	   19136	     62446 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Int64-8         	   18956	     63186 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Int64-8         	   17336	     70961 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Int64-8        	   15318	     74221 ns/op

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Summary-8       	   69717	     18255 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Summary-8       	   53949	     21852 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Summary-8       	   54154	     21925 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Summary-8       	   54010	     21895 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Summary-8       	   52406	     22943 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Summary-8      	   35911	     32651 ns/op

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Histogram-8     	   48699	     23925 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Histogram-8     	   39019	     29297 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Histogram-8     	   38566	     30174 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Histogram-8     	   40514	     29930 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Histogram-8     	   36490	     31997 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Histogram-8    	   28291	     42025 ns/op

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/HistogramSeries-8         	   10000	    101948 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/HistogramSeries-8         	    9901	    127268 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/HistogramSeries-8         	    9841	    114856 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/HistogramSeries-8         	   10000	    109458 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/HistogramSeries-8         	   10000	    114854 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/HistogramSeries-8        	    7179	    240937 ns/op

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/Mix-8                     	    8173	    137942 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/Mix-8                     	    6388	    163060 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/Mix-8                     	    6651	    173666 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/Mix-8                     	    5416	    202425 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/Mix-8                     	    5396	    188160 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/Mix-8                    	    5061	    212527 ns/op

BenchmarkEncode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8               	    2090	    608697 ns/op
BenchmarkEncode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8               	    1459	    795073 ns/op
BenchmarkEncode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8               	    1659	    734161 ns/op
BenchmarkEncode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8               	    1671	    824701 ns/op
BenchmarkEncode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8               	    1184	   1285495 ns/op
BenchmarkEncode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8              	    1050	   1281293 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Int64-8                   	    3009	    348570 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Int64-8                   	    3894	    287816 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Int64-8                   	    3750	    289633 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Int64-8                   	    3910	    285599 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Int64-8                   	    3678	    293326 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Int64-8                  	    3639	    304446 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Summary-8                 	   13567	     88821 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Summary-8                 	   15590	     77919 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Summary-8                 	   15488	     78906 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Summary-8                 	   15544	     76030 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Summary-8                 	   15144	     80054 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Summary-8                	   12070	     99301 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Histogram-8               	   10000	    116692 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Histogram-8               	   11652	    123209 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Histogram-8               	    9837	    108647 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Histogram-8               	    9891	    115249 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Histogram-8               	    9692	    118362 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Histogram-8              	    7743	    146608 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/HistogramSeries-8         	    2151	    474082 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/HistogramSeries-8         	    2374	    448891 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/HistogramSeries-8         	    2516	    483569 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/HistogramSeries-8         	    2412	    468562 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/HistogramSeries-8         	    2698	    476729 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/HistogramSeries-8        	    1855	    592413 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/Mix-8                     	    2259	    547694 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/Mix-8                     	    2108	    526082 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/Mix-8                     	    2148	    515173 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/Mix-8                     	    2185	    507508 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/Mix-8                     	    2295	    522157 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/Mix-8                    	    2098	    604631 ns/op

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8               	     453	   2390820 ns/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8               	     571	   2131050 ns/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8               	     513	   2314529 ns/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8               	     489	   2085016 ns/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8               	     568	   2307425 ns/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8              	     447	   2636537 ns/op

Additionally, a show of the encoded byte size:

Encoding                       Uncompressed  Improved      Compressed  Improved      Compressed  Improved
OTLP 0.4 (Gogo Faster)/Metric/Gauge  29982 bytes [1.000], zlib  1750 bytes [1.000], zstd  1861 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/Gauge  29882 bytes [1.003], zlib  1743 bytes [1.004], zstd  1858 bytes [1.002]
OTLP 0.6 (Gogo Faster)/Metric/Gauge  29882 bytes [1.003], zlib  1743 bytes [1.004], zstd  1858 bytes [1.002]
OTLP 0.7 (Gogo Faster)/Metric/Gauge  29882 bytes [1.003], zlib  1743 bytes [1.004], zstd  1858 bytes [1.002]
OTLP 0.8 (Gogo Faster)/Metric/Gauge  34382 bytes [0.872], zlib  1998 bytes [0.876], zstd  1920 bytes [0.969]
OTLP HEAD (Gogo Faster)/Metric/Gauge  34382 bytes [0.872], zlib  1998 bytes [0.876], zstd  1920 bytes [0.969]

Encoding                       Uncompressed  Improved      Compressed  Improved      Compressed  Improved
OTLP 0.4 (Gogo Faster)/Metric/Histogram  13170 bytes [1.000], zlib  1824 bytes [1.000], zstd  1313 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/Histogram  14580 bytes [0.903], zlib  1849 bytes [0.986], zstd  1361 bytes [0.965]
OTLP 0.6 (Gogo Faster)/Metric/Histogram  14580 bytes [0.903], zlib  1849 bytes [0.986], zstd  1361 bytes [0.965]
OTLP 0.7 (Gogo Faster)/Metric/Histogram  14580 bytes [0.903], zlib  1849 bytes [0.986], zstd  1361 bytes [0.965]
OTLP 0.8 (Gogo Faster)/Metric/Histogram  15480 bytes [0.851], zlib  1882 bytes [0.969], zstd  1341 bytes [0.979]
OTLP HEAD (Gogo Faster)/Metric/Histogram  15880 bytes [0.829], zlib  1861 bytes [0.980], zstd  1373 bytes [0.956]

Encoding                       Uncompressed  Improved      Compressed  Improved      Compressed  Improved
OTLP 0.4 (Gogo Faster)/Metric/MixOne  53332 bytes [1.000], zlib  3921 bytes [1.000], zstd  3876 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/MixOne  53142 bytes [1.004], zlib  3894 bytes [1.007], zstd  3897 bytes [0.995]
OTLP 0.6 (Gogo Faster)/Metric/MixOne  53142 bytes [1.004], zlib  3894 bytes [1.007], zstd  3897 bytes [0.995]
OTLP 0.7 (Gogo Faster)/Metric/MixOne  53142 bytes [1.004], zlib  3894 bytes [1.007], zstd  3897 bytes [0.995]
OTLP 0.8 (Gogo Faster)/Metric/MixOne  59442 bytes [0.897], zlib  4232 bytes [0.927], zstd  3927 bytes [0.987]
OTLP HEAD (Gogo Faster)/Metric/MixOne  60242 bytes [0.885], zlib  4251 bytes [0.922], zstd  4013 bytes [0.966]

Encoding                       Uncompressed  Improved      Compressed  Improved      Compressed  Improved
OTLP 0.4 (Gogo Faster)/Metric/MixSeries 199279 bytes [1.000], zlib 12513 bytes [1.000], zstd 14904 bytes [1.000]
OTLP 0.5 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.6 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.7 (Gogo Faster)/Metric/MixSeries 210498 bytes [0.947], zlib 12660 bytes [0.988], zstd 14564 bytes [1.023]
OTLP 0.8 (Gogo Faster)/Metric/MixSeries 227742 bytes [0.875], zlib 13412 bytes [0.933], zstd 15063 bytes [0.989]
OTLP HEAD (Gogo Faster)/Metric/MixSeries 231742 bytes [0.860], zlib 13548 bytes [0.924], zstd 14690 bytes [1.015]

Obvious points

  • Every nested message added is increasing serialized size
  • oneof usage has been slowly degrading (go) performance since 0.4
    • 0.4 => 0.5 sees a major hit to performance from oneofs for metric type vs. enum/descriptor
    • 0.8 => present sees another hit to encoding performance due to Attributes vs. Labels.
  • When viewing the "MixSeries-8" benchmark, relative performance differential is not as noticable.

@victlu confirmed similar results in Go.

Next AIs

  • @victlu is going to check oneof performance in C# to see if the drops in performance are as severe
  • We're going to investigate using optional fields + a single enum vs. oneof in Go to determine if this improves performance significantly
  • I'm going to try to do some isolated changes around larger performance dips to focus on root-cause (oneof, added nested messages or other).

jsuereth avatar Apr 23 '21 19:04 jsuereth

We're going to investigate using optional fields + a single enum vs. oneof in Go to determine if this improves performance significantly

IIRC: usually for a very small number of options in oneof a struct+enum is both faster and smaller in memory. The more options you add the larger memory usage in struct+enum becomes. At some point it starts using more memory than oneof and for even larger number of options if I remember correctly it starts slowing down due to too much memory used. TLDR: you will need to measure to see if struct+enum is faster and/or smaller than oneof.

tigrannajaryan avatar Apr 23 '21 19:04 tigrannajaryan

Hint: use -benchmem to see memory usage (but be aware that is total for everything the benchmark does, so will need interpret carefully).

tigrannajaryan avatar Apr 23 '21 19:04 tigrannajaryan

Here's a look at "no oneof" in the entire protocol. There is a decent memory usage bump for a performance gain:

BenchmarkDecode/OTLP_0.4_(Gogo_Faster)/Metric/MixSeries-8                         	     464	   3364230 ns/op	  853510 B/op	   38137 allocs/op
BenchmarkDecode/OTLP_0.5_(Gogo_Faster)/Metric/MixSeries-8                         	     454	   2783817 ns/op	  915910 B/op	   36437 allocs/op
BenchmarkDecode/OTLP_0.6_(Gogo_Faster)/Metric/MixSeries-8                         	     483	   2425550 ns/op	  915908 B/op	   36437 allocs/op
BenchmarkDecode/OTLP_0.7_(Gogo_Faster)/Metric/MixSeries-8                         	     547	   2459825 ns/op	  915910 B/op	   36437 allocs/op
BenchmarkDecode/OTLP_0.8_(Gogo_Faster)/Metric/MixSeries-8                         	     457	   2591545 ns/op	  920704 B/op	   39937 allocs/op
BenchmarkDecode/OTLP_HEAD_(Gogo_Faster)/Metric/MixSeries-8                        	     423	   2716310 ns/op	 1111123 B/op	   43937 allocs/op
BenchmarkDecode/OTLP_HEAD_No_oneof_(Gogo_Faster)/Metric/MixSeries-8               	     483	   2438340 ns/op	 1119909 B/op	   40137 allocs/op

Thanks for the -benchmem tip. Still new to Go.

jsuereth avatar Apr 23 '21 20:04 jsuereth

I ran .NET C# benchmarked across all our OTLP proto versions. This is an extensive run (~4 hours) so jitter artifacts should not be an issue.

My takeaway from this test comparing v0.4.0 with v0.8.0:

  • Computation wise, overall we are slower by 6% difference. The outlier is DecodingGauge which is slower by 16% difference.
  • Memory Allocation wise, we are better or same across all operations. In some cases we are better by 20%.

Results

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-1065G7 CPU 1.30GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.201
  [Host]     : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
  Job-UQXYHB : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT

IterationCount=50  LaunchCount=10  WarmupCount=10

|          Method | Version |      Mean |    Error |    StdDev |    Median |   Gen 0 |   Gen 1 | Gen 2 | Allocated |
|---------------- |-------- |----------:|---------:|----------:|----------:|--------:|--------:|------:|----------:|
|     EncodeGauge |   0.4.0 |  90.38 us | 0.743 us |  4.922 us |  88.93 us | 20.9961 |  0.1221 |     - |  85.98 KB |
|     DecodeGauge |   0.4.0 |  80.52 us | 0.359 us |  2.367 us |  79.65 us | 30.2734 |  0.6104 |     - | 124.03 KB |
|   EncodeSummary |   0.4.0 | 120.16 us | 0.796 us |  5.251 us | 118.80 us | 29.0527 |  6.3477 |     - | 118.99 KB |
|   DecodeSummary |   0.4.0 | 224.80 us | 2.788 us | 18.662 us | 217.48 us | 49.0723 |  1.2207 |     - | 200.59 KB |
| EncodeHistogram |   0.4.0 | 166.26 us | 1.102 us |  7.261 us | 163.73 us | 41.0156 | 13.6719 |     - | 168.02 KB |
| DecodeHistogram |   0.4.0 | 277.14 us | 1.827 us | 11.845 us | 272.85 us | 74.2188 | 23.4375 |     - |  309.4 KB |
|     EncodeGauge |   0.5.0 |  93.26 us | 0.512 us |  3.351 us |  92.47 us | 19.1650 |  0.3662 |     - |  78.66 KB |
|     DecodeGauge |   0.5.0 |  81.69 us | 0.391 us |  2.530 us |  80.84 us | 28.4424 |       - |     - | 116.22 KB |
|   EncodeSummary |   0.5.0 |  95.05 us | 0.549 us |  3.596 us |  94.34 us | 19.1650 |  0.1221 |     - |  78.56 KB |
|   DecodeSummary |   0.5.0 |  90.39 us | 0.428 us |  2.757 us |  89.52 us | 30.5176 |  0.4883 |     - | 124.81 KB |
| EncodeHistogram |   0.5.0 | 154.30 us | 0.760 us |  4.806 us | 153.25 us | 36.1328 |  0.2441 |     - |  147.7 KB |
| DecodeHistogram |   0.5.0 | 260.91 us | 6.413 us | 41.713 us | 245.25 us | 60.5469 | 20.0195 |     - | 257.84 KB |
|     EncodeGauge |   0.6.0 |  92.14 us | 0.918 us |  5.845 us |  90.29 us | 16.3574 |  0.1221 |     - |  67.06 KB |
|     DecodeGauge |   0.6.0 | 100.51 us | 1.193 us |  7.711 us |  97.63 us | 29.1748 |  0.2441 |     - | 119.34 KB |
|   EncodeSummary |   0.6.0 |  89.41 us | 0.709 us |  4.612 us |  87.85 us | 16.6016 |       - |     - |  67.84 KB |
|   DecodeSummary |   0.6.0 |  91.23 us | 0.387 us |  2.566 us |  90.50 us | 28.4424 |       - |     - | 116.22 KB |
| EncodeHistogram |   0.6.0 | 166.32 us | 1.428 us |  9.412 us | 163.94 us | 36.8652 |  9.2773 |     - | 151.83 KB |
| DecodeHistogram |   0.6.0 | 279.38 us | 2.231 us | 14.414 us | 275.05 us | 70.8008 | 22.9492 |     - | 304.71 KB |
|     EncodeGauge |   0.7.0 |  97.39 us | 1.619 us | 10.779 us |  94.38 us | 16.3574 |  0.1221 |     - |  67.06 KB |
|     DecodeGauge |   0.7.0 |  99.56 us | 0.987 us |  6.559 us |  97.21 us | 29.0527 |  0.4883 |     - | 119.34 KB |
|   EncodeSummary |   0.7.0 | 122.87 us | 0.933 us |  6.127 us | 120.65 us | 23.6816 |  4.6387 |     - |  97.24 KB |
|   DecodeSummary |   0.7.0 | 235.73 us | 2.654 us | 17.616 us | 229.94 us | 46.3867 |  3.9063 |     - | 189.66 KB |
| EncodeHistogram |   0.7.0 | 165.69 us | 1.245 us |  8.091 us | 162.92 us | 36.8652 |  9.2773 |     - | 151.83 KB |
| DecodeHistogram |   0.7.0 | 273.84 us | 1.980 us | 12.905 us | 269.18 us | 68.8477 | 21.9727 |     - | 304.71 KB |
|     EncodeGauge |   0.8.0 |  92.84 us | 0.687 us |  4.558 us |  91.60 us | 17.0898 |  3.4180 |     - |  70.19 KB |
|     DecodeGauge |   0.8.0 |  94.21 us | 0.707 us |  4.688 us |  92.42 us | 29.7852 |  0.2441 |     - | 121.69 KB |
|   EncodeSummary |   0.8.0 | 127.31 us | 2.397 us | 15.724 us | 121.89 us | 23.8037 |  2.8076 |     - |  97.24 KB |
|   DecodeSummary |   0.8.0 | 238.60 us | 2.169 us | 14.517 us | 235.57 us | 46.3867 |  4.8828 |     - | 189.66 KB |
| EncodeHistogram |   0.8.0 | 176.01 us | 1.245 us |  8.271 us | 173.24 us | 41.0156 |  2.6855 |     - | 167.84 KB |
| DecodeHistogram |   0.8.0 | 290.50 us | 2.308 us | 15.303 us | 285.33 us | 68.3594 | 19.5313 |     - | 317.21 KB |

Analysis

Time/Mem Diff column is comparing % difference against base of 0.4.0 version.

Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
DecodeGauge 0.4.0 80.52 124.03 0% 0%
DecodeGauge 0.5.0 81.69 116.22 1% 7%
DecodeGauge 0.6.0 100.51 119.34 22% 4%
DecodeGauge 0.7.0 99.56 119.34 21% 4%
DecodeGauge 0.8.0 94.21 121.69 16% 2%
Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
EncodeGauge 0.4.0 90.38 85.98 0% 0%
EncodeGauge 0.5.0 93.26 78.66 3% 9%
EncodeGauge 0.6.0 92.14 67.06 2% 25%
EncodeGauge 0.7.0 97.39 67.06 7% 25%
EncodeGauge 0.8.0 92.84 70.19 3% 20%
Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
DecodeSummary 0.4.0 224.8 200.59 0% 0%
DecodeSummary 0.5.0 90.39 124.81 85% 47%
DecodeSummary 0.6.0 91.23 116.22 85% 53%
DecodeSummary 0.7.0 235.73 189.66 5% 6%
DecodeSummary 0.8.0 238.6 189.66 6% 6%
Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
EncodeSummary 0.4.0 120.16 118.99 0% 0%
EncodeSummary 0.5.0 95.05 78.56 23% 41%
EncodeSummary 0.6.0 89.41 67.84 29% 55%
EncodeSummary 0.7.0 122.87 97.24 2% 20%
EncodeSummary 0.8.0 127.31 97.24 6% 20%
Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
DecodeHistogram 0.4.0 277.14 309.4 0% 0%
DecodeHistogram 0.5.0 260.91 257.84 6% 18%
DecodeHistogram 0.6.0 279.38 304.71 1% 2%
DecodeHistogram 0.7.0 273.84 304.71 1% 2%
DecodeHistogram 0.8.0 290.5 317.21 5% 2%
Method Version Mean (us) Allocated (KB) Time Diff Mem Diff
EncodeHistogram 0.4.0 166.26 168.02 0% 0%
EncodeHistogram 0.5.0 154.3 147.7 7% 13%
EncodeHistogram 0.6.0 166.32 151.83 0% 10%
EncodeHistogram 0.7.0 165.69 151.83 0% 10%
EncodeHistogram 0.8.0 176.01 167.84 6% 0%

victlu avatar Apr 24 '21 16:04 victlu

So I compared two experiments for ideas on what happened to performance to try to better isolate the cost of changes.

The first experiment was removing all "oneof" instances within the protocol and instead hanging two fields. This should be binary compatible. E.g. here's the change to metric:

Experiment 1: No oneofs

message Metric {
  // name of the metric, including its DNS name prefix. It must be unique.
  string name = 1;

  // description of the metric, which can be used in documentation.
  string description = 2;

  // unit in which the metric value is reported. Follows the format
  // described by http://unitsofmeasure.org/ucum.html.
  string unit = 3;

  Gauge gauge = 5;
  Sum sum = 7;
  Histogram histogram = 9;
  Summary summary = 11;
}

Experiment 2: Flatten Metrics

The second experiment attempts to revert behavior much closer to the 0.4 behavior by removing the early bundling of metrics by type and using a TYPE enum on metric;

message Metric {
  // name of the metric, including its DNS name prefix. It must be unique.
  string name = 1;

  // description of the metric, which can be used in documentation.
  string description = 2;

  // unit in which the metric value is reported. Follows the format
  // described by http://unitsofmeasure.org/ucum.html.
  string unit = 3;

  // Type is the type of values a metric has.
  enum Type {
    // INVALID_TYPE is the default Type, it MUST not be used.
    INVALID_TYPE = 0;

    // TODO: doc
    GAUGE = 1;

    // TODO: doc
    SUM = 2;

    // Histogram measurement.
    // Corresponding values are stored in HistogramDataPoint.
    HISTOGRAM = 3;

    // Summary value. Some frameworks implemented Histograms as a summary of observations
    // (usually things like request durations and response sizes). While it
    // also provides a total count of observations and a sum of all observed
    // values, it calculates configurable percentiles over a sliding time
    // window.
    // Corresponding values are stored in SummaryDataPoint.
    SUMMARY = 4;
  }

  // type is the type of values this metric has.
  Type type = 4;

  // aggregation_temporality describes if the aggregator reports delta changes
  // since last report time, or cumulative changes since a fixed start time.
  // Only used in Sum/Histogram metrics.
  AggregationTemporality aggregation_temporality = 5;
  // If "true" means that the sum/gauge is monotonic.
  // Only valid for Sums + Gauges.
  bool is_monotonic_sum = 6;

  // Only exists if Gauge metric
  repeated NumberDataPoint sum_or_gauge_data_points = 10;
  // Only exists if Sum metric
  //repeated NumberDataPoint sum_data_points = 11;
  // Only exists if histogram metric
  repeated HistogramDataPoint histogram_data_points = 12;
  // Only exists if Summary metric
  repeated SummaryDataPoint summary_data_points = 13;
}

You can find the code (based on @tigrannajaryan's benchmarks) here.

Results

"MixSeries-8" benchmark form linked code. Relative % are from 0.4 baseline.

Method Version Ns / Operation Bytes / Operation Allocations
Encode 0.4 513374 204816 2
Encode 0.8 736628 (+43%) 229392 (+12%) 2
Encode HEAD 953726 (+85%) 237584 (+16%) 2
Encode Exp1: No oneof 803390 (+56%) 229392 (+12%) 2
Encode Exp2: flat 601148 (+17%) 237584 (+16%) 2
Decode 0.4 2045311 853507 38137
Decode 0.8 2036024 (99%) 920704 (107%) 39937 (104%)
Decode HEAD 2271604 (111%) 1111122 (130%) 43937 (115%)
Decode Exp1: No oneof 2793006 (136%) 1279907 (149%) 50137 (131%)
Decode Exp2: flat 2241217 (108%) 1123125 (131%) 43337 (113%)

Notes:

  • 0.4 was baseline of previous metrics. Between 0.4 => 0.8 the Metric message was created which has a oneof for Histogram, Gauage, Sum etc.
  • HEAD represents the current main branch of the proto repository. The major change here was moving from String=>String labels to typed Attributes.
  • Current metric mix is 8 series per metric, with two labels per metric type. For a future task going to attempt to make this metric mix be a bit more realistic with what we'd expect. For now, shifts in metric representation are OVER emphasized.

Thoughts

  • Label => Attribute shift causes ~ 10% allocation + performance penalty. Given the reasons behind this change and attribute prevalence in the protocol, I think we can re-evaluaute attribute performance across the entire protocol at some point. For now, metrics should "look similar" to other signals.
  • Unwinding the Metric type to use an enum and repeated data points with optional fields DOES lead to a (~20%) performance gain over the current encoding. We'll need to discuss this in the Metric Data Model SiG. It's possible there are changes like this which lead to better performance at less of a cognitive cost. Note: The decision to use the current sub-metric message modelling occurred in 0.6. The pull request included performance benchmark that makes me want to investigate differences between that benchamrk and this one, where we do see a slight performance hit w/ 0.6.

Future work

  • Improve generation of sampled data
    • Ensure percentiles in Summary are always generated
    • Add configurable + realistic number of tags/attributes to metrics
    • Ensure both Gauge + Sum (with both Double + Int) values are represented.
  • Clean up the benchmarks to a repeatable + "check-in" friendly state.
  • Discuss future options in DataModel SiG

jsuereth avatar Apr 27 '21 13:04 jsuereth

From @bogdandrutu: use

s+repeated opentelemetry.proto.common.v1.KeyValue \(.*\);+repeated opentelemetry.proto.common.v1.KeyValue \1\
  [ (gogoproto.nullable) = false ];+g

for key-values.

jsuereth avatar Apr 27 '21 16:04 jsuereth

Here's a capture of the discussion from the DataModel SiG:

  • Given @victlu's benchmarks in C#, we think we'll similar performance in non-Go language.
  • @bogdandrutu had a set of things to try regarding Go-specifically.
  • The consensus here was that given the experimental results, it's likely that an alternative approach to Go implementation would lead to similar benefits over changing the proto structure. Specifically, removing oneofs or using enum + repeated fields are likely things that can be done in a Go-specific way, and saw little benefit in C#.
  • We're downgrading this work from release-blocking, and we'll continue to push on performance ideas and improvements for Go.

jsuereth avatar Apr 27 '21 17:04 jsuereth

  • Specifically, removing oneofs or using enum + repeated fields are likely things that can be done in a Go-specific way, and saw little benefit in C#.

I am not aware of an easy way to do this in Go. oneof Protobuf generators use interfaces which are slow in Go. The non-easy (but doable) way to make oneof fast is a custom data type (using Gogoproto, not possible with Goproto), which implements a fast Variant data type like this: https://github.com/tigrannajaryan/govariant

tigrannajaryan avatar Apr 27 '21 17:04 tigrannajaryan

@bogdandrutu lead the discussion on fixing Go. I assume he was thinking Gogoproto + fast variant, but I let him comment here. I'm definitely convinced Bogdan can fix this :)

jsuereth avatar Apr 27 '21 17:04 jsuereth

Note OTel-Go has a variant type for its Attribute value. Surely this can be fixed in Go with a custom solution.

jmacd avatar Apr 27 '21 19:04 jmacd

@jmacd the problem is that Goproto does not have a way to use custom data structures for messages. At least I am not aware of a way (unless you modify the generated code). Gogoproto supports custom data structures and we use it for some messages already in the Collector for performance reasons (not for attribute values yet, but will possibly do in the future).

BTW, this is likely a faster and more compact variant than what Otel-Go uses: https://github.com/tigrannajaryan/govariant :-)

tigrannajaryan avatar Apr 27 '21 19:04 tigrannajaryan

To more clearly delineate the decision documented in: https://github.com/open-telemetry/opentelemetry-proto/issues/287#issuecomment-827787057

This issue was downgraded from release blocking. We think the performance of the protos is acceptable in most languages (where alternatives don't provide enough benefit), and we think Go's incarnation of performance can be fixed.

The todos are:

  • [ ] Determine go-specific performance improvements that do not change the protocol structure (just in-go).
  • [ ] Create an on-going performance suite to measure performance @ each release of OTLP and prevent major (unknown) regressions. (I've been working on taking this, I expect this to be a multi-month effort)

I can split this out into separate bugs if it helps.

jsuereth avatar May 12 '21 20:05 jsuereth

Metrics are now Stable and we are not going to change in a breaking way so this is likely non-actionable. Closing.

tigrannajaryan avatar Sep 28 '22 01:09 tigrannajaryan