rabbitmq-dotnet-client
rabbitmq-dotnet-client copied to clipboard
Use System.Diagnostics.DiagnosticSource for tracing and telemetry
This is something that I have been wondering about for some time on what would be the best approach for observability.
Since debugging a library like this can be hard, I wonder what we'd want to do to make this easier.
I'd really like to suggest using System.Diagnostics.DiagnosticSource to provide parity with a lot of other .NET libraries and frameworks (both Core and Framework).
Using this library it becomes a lot easier to plug in logging frameworks and profiling tools to monitor and measure operations within the library. It also has the added benefit to have next to no overhead when no-one is listening for the events, since it is exposing ETW events (and in .NET Core, can expose event counters).
I also suggest looking into the new Tracing API on Activity which should be part of .NET 5 release.
Yeah, that looks good. A good first step would be to add basic Activity support, ETW and event counters. I'm planning to look at that soon-ish. This would be a logical next step after that.
Hi @stebet, just wanted to follow-up on your previous note and see if you had any sorta update or a product roadmap you could share around Activity support?
@jaycdave88 right now the plan is to investigate System.Diagnostics.DiagnosticSource. This may or may not happen by 7.0 but can go into a 7.x
release. We need to spike it first. If there are popular alternatives in the community, we would consider them. Let us know ;)
@michaelklishin Depending on the timeline, definitely check out the new ActivitySource API. Very similar to DiagnosticSource, but it supports sampling and you can write tags/attributes on the events instead of writing raw objects (which people usually need to consume reflectively). It releases with .NET 5 (~November), but you don't need .NET 5 to use it. The NuGet package targets all actively supported .NET versions.
The main benefit to using ActivitySource is it means instrumentation libraries like OpenTelemetry will work without needing to build an adapter or anything.
Also happy to help with this if you are open to it.
If ActivitySource will support all then/currently supported versions of .NET, we should consider it. A proof-of-concept PR would be appreciated (I assume there are preview releases of some kind that developers can try?)
ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)
What would people consider good activities/counters to start with?
Here are a few rough ideas:
- Bytes sent
- Bytes received
- Frames deserialized
- Frames serialized
- Total Connections opened
- Connections currently open
- Consumers active
- Channels Open
- Total channels opened
- ???
If ActivitySource will support all then/currently supported versions of .NET, we should consider it. A proof-of-concept PR would be appreciated (I assume there are preview releases of some kind that developers can try?)
ActivitySource/Activity
are part of the nuget package https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/5.0.0-rc.1.20451.14. It can be used right now with preview releases.
An example on using ActivitySource
is available here in OpenTelemetry repo.
https://github.com/open-telemetry/opentelemetry-dotnet/blob/master/examples/Console/TestConsoleExporter.cs#L41-L94
ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)
What would people consider good activities/counters to start with?
Here are a few rough ideas:
- Bytes sent
- Bytes received
- Frames deserialized
- Frames serialized
- Total Connections opened
- Connections currently open
- Consumers active
- Channels Open
- Total channels opened
- ???
I think something likeFrames serializing
should be considered, the telemetry instruments may want to inject tracing header into the frame before it serialized to send.
Just to clarify - ActivitySource/Activity API is for tracing , and not for metrics. Things like BytesSent, etc are metric, which require a metric API. .NET has EventCounter
API to support this already, but it is a very limited API. There is a new Metric API (Opentelemetry compatible), coming with .NET 6 (Nov 2021), which will support all the old versions of the .NET itself. (https://github.com/open-telemetry/opentelemetry-dotnet/issues/1501)
@cijothomas Yep. I'm aware of the differences. We're looking to integrate both the new dotnet counters (for metrics) as well as support for Activity Tracing. Tracing will come later since the OpenTelemetry is currently being worked on but I'm watching it closely :)
@stebet Thanks! To instrument, you won't need OpenTelemetry dependency, right? You can use 5.0.0 of DiagnosticSource. (Also OpenTelemetry will release 1.0.0 in <2 weeks)
Please also check the Metric plans in .NET 6 as well.
@stebet Thanks! To instrument, you won't need OpenTelemetry dependency, right? You can use 5.0.0 of DiagnosticSource. (Also OpenTelemetry will release 1.0.0 in <2 weeks)
Please also check the Metric plans in .NET 6 as well.
Thanks for the info :)
Also consider adopting the recommended tags and naming conventions from OpenTelemetry.
And if you need an example of dealing with ActivitySource, I added support for the Trace Context standard and activity source for NServiceBus:
https://github.com/jbogard/NServiceBus.Extensions.Diagnostics/blob/master/src/NServiceBus.Extensions.Diagnostics/IncomingPhysicalMessageDiagnostics.cs
https://github.com/jbogard/NServiceBus.Extensions.Diagnostics/blob/master/src/NServiceBus.Extensions.Diagnostics/OutgoingPhysicalMessageDiagnostics.cs
https://github.com/jbogard/NServiceBus.Extensions.Diagnostics/blob/master/src/NServiceBus.Extensions.Diagnostics/SettingsActivityEnricher.cs
ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)
What would people consider good activities/counters to start with?
Here are a few rough ideas:
- Bytes sent
- Bytes received
- Frames deserialized
- Frames serialized
- Total Connections opened
- Connections currently open
- Consumers active
- Channels Open
- Total channels opened
- ???
I wanted to play a bit with it and found that the PollingCounters are only NS2.1 / .net core 3.0. I guess that means we can't do it for NS.2.0 targets or am I missing something?
Let's make a distinction between activities and counters since they aren't the same thing (although related).
There are Event counters in NS2.0 targets, but it's a lot less capable. It'd be fine by me if we only had counters for NS2.1+ and skip them for NS2.0, which is on the way out eventually anyway.
Activities can work for both targets.
Shall I create a new one for the counters, so we don't mix them up in here?
There are Event counters in NS2.0 targets, but it's a lot less capable.
Are there? I wasn't able to find any useable classes?
It'd be fine by me if we only had counters for NS2.1+ and skip them for NS2.0, which is on the way out eventually anyway.
Works for me 👍
When using OpenTelemetry for tracing across services, the span context needs to be channeled over the wire. Will this issue contain the encoding (while publishing) and decoding (while receiving) from and to the .NET Activity framework be included here or is this expected to be done somewhere else? To my limited insights, I do not see a hookup infrastructure where a third party could include a middleware into the bare RabbitMQ.Client when sending/receiving messages.
I am currently building a middleware layer on top of RabbitMQ and wonder, whether I should write a "default" middleware doing the OpenTelemetry decoding.
The code how this OpenTelemetry span context propagation works without support from the RabbitMQ.Client
is described in this article. The code fragments in the article also show how it works transparently for HTTP.
We similarly have a wrapper around RabbitMQ.Client we've been using for a few years (that exposes consumers/exchanges as ISource/TargetBlock<AmqpMessage>
) which we instrumented for OpenTracing and are looking at moving to OpenTelemetry/Activity.
We currently extract/inject trace context into message headers on consume/publish but it would be nice if the base client library handled that as well as the Activity
creation.
Where I'm uncertain is how to handle the length of the Activity (we only use Consumer based flows, but manual Get would likely need to be different?). There is the period from Prefetch to the Consumer implementation finishing the event/callback. The "processing" of the message could either be inside that span, resulting in an ack/nack before returning from the Consumer event/callback or, as in our framework, the AmqpMessage
could be placed in a client side queue (ex. BufferBlock) and processed potentially after the Consumer event/callback is completed (ex. by an attached ActionBlock). (We stash the Channel in the AmqpMessage headers so we can ack/nack the message when processing is completed).
I suppose that the client side queue could be modeled in OpenTelemetry as another Consumer-type span with a parent as the Consumer event/callback span that would be completed as soon as the event/callback span has placed the message in the in-process queue.
Is this issue well understood? Are there any design blockers getting this through?
@davidfowl it has been over two years since it was last discussed. So no, it is not well defined.
I've been following this so far https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/MicroserviceExample/Utils/Messaging
It's an example on the open telemetry dotnet repo that uses RabbitMQ.Client and adds the appropriate activities to the producer and consumer.
If we start with this as the proposal, what issues would we run into?
I think it may be worth to share with you the OpenTelemetry semantic conventions (they are experimental, but there is nothing better) for telemetry produced for RabbitMQ:
- https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/
- https://opentelemetry.io/docs/specs/semconv/messaging/rabbitmq/
That question can only be answered by trying it but this looks like an example, not something that can or should be taken straight into production.
This example has some curious decisions around how it uses this client. Instead of inheriting from/extending EventingBasicConsumer
it uses a "helper", channel opening and queue declaration are lumped together, and so on.
I assume that everything around how they use activities is reasonable. The question for library maintains use how do you integrate that in a way that's not too restrictive and does not push OpenTelemetry down everyone's throat.
I'd look for clients for other popular data services (PostgreSQL, Redis, ElasticSearch, etcd, et cetera) to learn from, these examples only demonstrate some key OpenTelemetry API elements (in C#).
@pellared that's helpful, thanks. Now someone who needs this feature should be able to come up with an API update proposal for this client, which won't be trivial. Team RabbitMQ likely won't get to work on this any time soon but this is open source software.
This can be contributed, it’s extremely important to get our core ecosystem libraries instrumented. Let’s get on the same page about how to accomplish this.
The way .NET integrates with open telemetry allows library authors to build support without direct dependencies on otel. Even though the semantic conventions are not final, you want to track and use them. This is what we are doing in .NET and making changes as they change .
The other alternative would be to build it an instrumentation library on top, but, that’s not ideal and you’d need the appropriate hooks in order to do it from the outside of the library.
This would be a point in time solution. This isn’t the way we want libraries to integrate with our telemetry systems.
This is already in progress for the most part here: https://github.com/rabbitmq/rabbitmq-dotnet-client/pull/1261
Had some comments from @lmolkova as well that I need to address. I'll take a look at this again soon. Would love more feedback as well.
Some attributes need to be updated to be in line with the standards (it's been awhile since I last synced), but this is working for the most part. CC: @davidfowl @michaelklishin
Also, implementing it like this via. ActivitySource shouldn't require any API changes. A good PR after the initial implentation would be to add OTel Metrics/Logs support as well.