opentelemetry-dotnet-contrib icon indicating copy to clipboard operation
opentelemetry-dotnet-contrib copied to clipboard

Performance optimization for Serialization of Guid in MessagePackSerializer

Open PhDuck opened this issue 3 years ago • 10 comments

My team has recently been looking at uptaking OpenTelemetry instead of writing directly to EventSource as we currently are doing. While playing around with it we stumbled upon a significant amount of time was being spent serializing Guids.

image

Looking at the serialization method it clear that GUID isn't special cased, which leads to a Convert.ToString(...) call which of course calls ToString() on the guid, followed by a Encoding.UTF8.GetBytes(...) call. : https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/977bbf540226285ee4f9188d1453621e4cd909f6/src/OpenTelemetry.Exporter.Geneva/MessagePackSerializer.cs#L527

This extra work should be avoidable with some clever usage of structlayout over the guids internals.

Talked with @mic-max who asked me to open this issue.

PhDuck avatar Jul 26 '22 00:07 PhDuck

@mic-max @PhDuck I want to have a contribution try but as i'm new to the area so can you give some suggestions? Do we need to change this method to remove something?

xyq175com avatar Jul 27 '22 23:07 xyq175com

@xyq175com The idea is to add a case Guid to the switch statement in Serialize linked in the issue. Currently a Guid object will fall into the default case and run the following code:

string repr = Convert.ToString(obj, CultureInfo.InvariantCulture);
return SerializeUnicodeString(buffer, cursor, repr);

Instead we'd like to create a more efficient serialization of Guid that works in the same way. Make sure all the test cases pass and ideally add some extras for Guid both invalid and valid ones and check that the performance is indeed improved with the change.

mic-max avatar Aug 02 '22 23:08 mic-max

@mic-max Thank you! Though i'm not 100% confident for this, but I will drop a PR per my understanding on this.

xyq175com avatar Aug 03 '22 02:08 xyq175com

I think we should hold on until there is good evidence that this is a good thing to do.

@PhDuck what were you trying to achieve and what does your current code look like?

reyang avatar Aug 03 '22 07:08 reyang

@reyang What kind of evidence are you looking for? I have some profiler traces that shows 15% of the samples is spent in the above code-path, the image in my original description shows the traces open in PerfView showing the 15% of samples spent there.

Our code is quite large, but fundamentally we are trying to switch from our current straight EventSource based logging approach to using OpenTelemetry with the Geneva exporter to emit events to the monitoring agent.

Our schema for logging features a decent amount of GUIDs since these are used to identify operations, so it would be really nice to try to optimize this path. Else the extra overhead might make us stay with direct to EventSource.

PhDuck avatar Aug 03 '22 11:08 PhDuck

@reyang What kind of evidence are you looking for? I have some profiler traces that shows 15% of the samples is spent in the above code-path, the image in my original description shows the traces open in PerfView showing the 15% of samples spent there.

Our code is quite large, but fundamentally we are trying to switch from our current straight EventSource based logging approach to using OpenTelemetry with the Geneva exporter to emit events to the monitoring agent.

Our schema for logging features a decent amount of GUIDs since these are used to identify operations, so it would be really nice to try to optimize this path. Else the extra overhead might make us stay with direct to EventSource.

These are some questions that would help:

  1. What do you expect the GUID to look like in the telemetry backend? String is probably a bad choice, because GUID is essentially 16-bytes while the string representation could be something like 36 - depending on whether you want the curly braces {} and hyphen -.
  2. Why cannot you do the conversion in your code (e.g. maybe a base64 encoded string, which is smaller/faster)? GUID is not a primitive type in almost all programming languages, in general we want to keep a very high bar on any language one-offs. I can imagine going down the path there will be more asks to support GUID with different styles, DateTime, DateTimeOffset, Enum/Flags, etc.

reyang avatar Aug 03 '22 17:08 reyang

It seems rather strange to me why the formatting of the data is questioned here, I hope the serializer doesn't care about the format and just serializes data (+ maybe type info). Based on reading the code that already seems to be the case.

We can indeed do it, but I'm concerned with the overhead of allocating to many more objects. I'm already seeing that OpenTelemetry + Geneva exporter is slower than normal EventSource -> ETW.

I don't know how much of that is offset by the MA having to do less by not having to serialize before sending it off. If there is any comparison of the two paths done by you, I would love to read it and understand it better.

The last 24 hours we emitted 33 billion logs via ETW and MA without problems. I have no clue if others are emitting more or less, but at this rate making it a few times more expensive would be significant.

PhDuck avatar Aug 03 '22 21:08 PhDuck

It seems rather strange to me why the formatting of the data is questioned here, I hope the serializer doesn't care about the format and just serializes data (+ maybe type info). Based on reading the code that already seems to be the case.

Sorry I don't understand it. The serialization is using a vendor agnostic format which only supports primitive types, arrays/lists and dictionaries. GUID is not supported (and to my knowledge GUID is not supported by the agent as well). Would you explain what does it mean by "just serializes data"? It seems from the original issue description you were suggesting something else?

We can indeed do it, but I'm concerned with the overhead of allocating to many more objects. I'm already seeing that OpenTelemetry + Geneva exporter is slower than normal EventSource -> ETW.

I think it's possible for you to do it without allocating many more objects. Have you explored the stack alloc or object pooling approach?

reyang avatar Aug 04 '22 01:08 reyang

Well, you are using your own implantation of MessagePack as the serializer, so you are free to extend it. See the spec on extension types.

Or you can take a look at how MessagePack-CSharp ended up supporting Guid.

[...] GUID is not supported (and to my knowledge GUID is not supported by the agent as well). [...] How does it then handle them when they are coming from ETW?

@reyang If you don't want to implement GUID support that is fair enough. However, I would argue that sending guid in logs is a rather common scenario, so I would appreciate the framework supporting it.

Thanks for your responses already.

PhDuck avatar Aug 05 '22 19:08 PhDuck

Well, you are using your own implantation of MessagePack as the serializer, so you are free to extend it. See the spec on extension types.

Or you can take a look at how MessagePack-CSharp ended up supporting Guid.

I don't think adding extension type from the exporter would work (I guess it'll end up having the agent dropping the data on the floor), if you think it will, feel free to send a prototype PR.

[...] GUID is not supported (and to my knowledge GUID is not supported by the agent as well). [...] How does it then handle them when they are coming from ETW?

@reyang If you don't want to implement GUID support that is fair enough. However, I would argue that sending guid in logs is a rather common scenario, so I would appreciate the framework supporting it.

Thanks for your responses already.

There have been many discussions around how to make proper design across programming languages / platforms, so far, it's relatively low priority https://github.com/open-telemetry/opentelemetry-specification/pull/1777#issuecomment-870723113.

reyang avatar Aug 05 '22 19:08 reyang

Feel free to reopen the issue if you still have questions.

utpilla avatar Oct 12 '22 01:10 utpilla