rabbitmq-dotnet-client icon indicating copy to clipboard operation
rabbitmq-dotnet-client copied to clipboard

Add OpenTelemetry support via. ActivitySource

Open stebet opened this issue 3 years ago • 6 comments

Proposed Changes

This is the basic implementation to add OpenTelemetry support to the .NET RabbitMQ Client. This implements an ActivitySource that creates and propagates Activities for BasicPublish (send), BasicDeliver (receive) and when consumers receive messages (process). This enables distributed tracing scenarios such as showing traces. Below is a screenshot from Honeycomb.io to show what happens when tracing is enabled.

Types of Changes

What types of changes does your code introduce to this project? Put an x in the boxes that apply

  • [ ] Bug fix (non-breaking change which fixes issue #NNNN)
  • [X] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • [ ] Documentation improvements (corrections, new content, etc)
  • [ ] Cosmetic change (whitespace, formatting, etc)

Checklist

  • [X] I have read the CONTRIBUTING.md document
  • [X] I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • [X] All tests pass locally with my changes
  • [X] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added necessary documentation (if appropriate)
  • [ ] Any dependent changes have been merged and published in related repositories

Further Comments

This should have minimal overhead when tracing is not enabled since no Activity generation occurs and no tags need to be populated.

Currently, to propagate the trace and span ids to correlate events, I'm taking advantage of the BasicProperties.Headers to send and parse the information needed to correlate individual traces.

Here is a screenshot from Honeycomb of what this can look like: image

stebet avatar Oct 11 '22 14:10 stebet

It would be great to get some feedback from @cijothomas, @MikeGoldsmith and @martinjt if they have time. I think I've populated the most important tags that are easy to get, but in case I'm missing some, it'd be great to get feedback.

stebet avatar Oct 11 '22 14:10 stebet

Hello @stebet - would you have a moment to give an update here?

lukebakken avatar Oct 11 '22 14:10 lukebakken

This should hopefully be ready now. Feel free to give it another pass @cijothomas if you have time :) @bollhals might find this interesting as well.

stebet avatar Oct 13 '22 08:10 stebet

This should have minimal overhead when tracing is not enabled since no Activity generation occurs and no tags need to be populated.

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case. For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

bollhals avatar Oct 13 '22 11:10 bollhals

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case. For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

Yeah, will do. Although even when tracing is enabled, if the activity isn't sampled, it'll still skip A LOT of the processing since it checks first if the Activity is requesting all data.

B.t.w, overhead is expected, but the APIs are very fast and used for both runtime stuff (SQL client, HTTP client) as well as as third party libraries like StackExchange.Redis so i.m.o the overhead when tracing is enabled is acceptable and known. That's why Sampling is a built-in feature for Activities.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

Yes. For example if you are using the OpenTelemetry libraries you'll need to do something like this:

builder.Services.AddOpenTelemetryTracing(builder =>
{
    builder
    .Configure((IServiceProvider serviceProvider, TracerProviderBuilder traceBuilder) =>
    {
        traceBuilder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("test-app", serviceVersion: "1.0.0"))
        .AddSource("RabbitMQ.Client") // Enable tracing for the RabbitMQ.Client library. Not setting this causes no activities to be created within the library.
        .AddAspNetCoreInstrumentation(options => options.RecordException = true)
        .... // more initialization stuff here
        ;
    });
});

stebet avatar Oct 13 '22 11:10 stebet

Would you mind running some tests on this (with and without tracing enabled)? I suspect some boxing occurring even in the disabled case. For the enabled case, this will cause significant impact (performance and allocation wise), as it has lots of tostrings and boxing and stuff.

Yeah, will do. Although even when tracing is enabled, if the activity isn't sampled, it'll still skip A LOT of the processing since it checks first if the Activity is requesting all data.

B.t.w, overhead is expected, but the APIs are very fast and used for both runtime stuff (SQL client, HTTP client) as well as as third party libraries like StackExchange.Redis so i.m.o the overhead when tracing is enabled is acceptable and known. That's why Sampling is a built-in feature for Activities.

Yes, sure it's expected if you enable such a feature, but there are still things we could optimize for to avoid allocations. But this can also wait for later. But we should look at overhead of the two other options of no tracing enabled and tracing enabled but not for RabbitMQ.

One question I had was do we need a separate option to enable it? E.g. my business has tracing enabled, but we'd might not want to know this level of detail from RabbitMQ.

Yes. For example if you are using the OpenTelemetry libraries you'll need to do something like this:

builder.Services.AddOpenTelemetryTracing(builder =>
{
    builder
    .Configure((IServiceProvider serviceProvider, TracerProviderBuilder traceBuilder) =>
    {
        traceBuilder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("test-app", serviceVersion: "1.0.0"))
        .AddSource("RabbitMQ.Client") // Enable tracing for the RabbitMQ.Client library. Not setting this causes no activities to be created within the library.
        .AddAspNetCoreInstrumentation(options => options.RecordException = true)
        .... // more initialization stuff here
        ;
    });
});

Ah yes, forgot about it, as I only had to do it once for our project :D Thanks for reminding me.

bollhals avatar Oct 13 '22 19:10 bollhals

@bollhals I did some simple benchmarks with the MassTest app modified to do 1000 batches (up from 100) with 500 items each, and seems the overhead is minimal (after I fixed a small bug):

Current master build image

ActivitySource build with OpenTelemetry initialized but not listening for RabbitMQ.Client events image

stebet avatar Oct 21 '22 09:10 stebet

Hi @stebet, thanks for making this PR. Do you think we could have it in the next coming months?

xsoheilalizadeh avatar Nov 10 '22 08:11 xsoheilalizadeh

Rebased this on the latest master and removed the need for OpenTelemetry.API reference as suggested by @rwkarg :)

stebet avatar Mar 13 '23 10:03 stebet

Updated to be in line with current spec. Thanks @Kielek, would love it if you could verify that I got things correcly :)

stebet avatar Mar 14 '23 11:03 stebet

@lukebakken @michaelklishin Not sure what's up with the Concourse build, since the others are now successful (after I restarted one that seemed to hang).

stebet avatar Mar 14 '23 15:03 stebet

Any update about this implementation?

jpablobarrientos avatar Mar 23 '23 22:03 jpablobarrientos

Any update about this implementation?

@jpablobarrientos if there were, you'd see it here.

If you'd like to contribute to moving this forward, you could do some or all of the following:

  • Review the PR and give your input.
  • Actually test the code in the stebet:activitysource and confirm that it does what it should.

lukebakken avatar Mar 24 '23 01:03 lukebakken

@stebet I'm fixing up conflicts now and then I'll review.

lukebakken avatar Mar 24 '23 13:03 lukebakken

@stebet I probably missed something while resolving conflicts. Would you mind checking these assertions out?

https://github.com/stebet/rabbitmq-dotnet-client/blob/activitysource/projects/Unit/ActivitySource/TestActivitySource.cs#L89-L96

lukebakken avatar Mar 24 '23 15:03 lukebakken

@stebet I probably missed something while resolving conflicts. Would you mind checking these assertions out?

https://github.com/stebet/rabbitmq-dotnet-client/blob/activitysource/projects/Unit/ActivitySource/TestActivitySource.cs#L89-L96

I'll take a look on monday :)

stebet avatar Mar 24 '23 17:03 stebet

@lukebakken Just squished the commit history and rebased on latest master, also added async tests for the telemetry.

stebet avatar May 03 '23 10:05 stebet

see also https://github.com/rabbitmq/rabbitmq-java-client/pull/1017#

ikavgo avatar May 04 '23 13:05 ikavgo

@stebet thanks. We have an initiative around OpenTelemetry that has just started so I will delay merging this branch.

lukebakken avatar May 04 '23 13:05 lukebakken

Hi, I created an md file to synchronize and track - https://github.com/rabbitmq/contribute/blob/main/observability/clients.md. Please take a look and change/add missing parts

ikavgo avatar May 04 '23 20:05 ikavgo

Any reason for not using messaging.source.name from OTel messaging conventions and set it to the queue name for the receive and process activities?

acogoluegnes avatar May 05 '23 09:05 acogoluegnes

Any reason for not using messaging.source.name from OTel messaging conventions and set it to the queue name for the receive and process activities?

we're going to remove it in https://github.com/open-telemetry/opentelemetry-specification/pull/3450 - it's proven to be too hard to use messaging.source.name and messaging.destination.name for the same thing and this PR is an excellent example of it.

lmolkova avatar May 05 '23 16:05 lmolkova

I'll update this PR with the changes in spec.

stebet avatar May 06 '23 13:05 stebet

aren't those changes not merged yet? at least context transfer should be merged asap.

ikavgo avatar May 06 '23 20:05 ikavgo

@ikvmw this PR is for work that will ship in the next major release of this client library. We have time to ensure coordination between this and other client libraries.

lukebakken avatar May 07 '23 14:05 lukebakken

Resolved comments from @lmolkova and @rwkarg and I think we are now in-line with the semantic conventions as well.

Rebased on latest main and cleaned up the commits.

Would love if interested parties (@lmolkova, @davidfowl, @rwkarg, @ikvmw, @Kielek) could give this another run through and then it's up to @michaelklishin and @lukebakken to merge when they feel it's good enough :)

stebet avatar Oct 19 '23 11:10 stebet

Not sure how this project documents things in general... I'd propose to document the steps an end user needs to do to get this to work.. It could be as simple as AddSource(sourceName). The sourceName used should be documented at the minimum.

cijothomas avatar Oct 20 '23 15:10 cijothomas

@pyohannes @lmolkova Tagging more folks, who has expertise in messaging conventions to review, if time allows. Thanks!

cijothomas avatar Oct 20 '23 15:10 cijothomas

@pyohannes @lmolkova Tagging more folks, who has expertise in messaging conventions to review, if time allows. Thanks!

Thanks for taking the time and giving this excellent feedback @cijothomas :)

stebet avatar Oct 20 '23 17:10 stebet

One question I have for @lmolkova and @pyohannes. Should the spans for deliver/receive include the time it takes for the client to process the messages (until ack/nack). Or should their duration just be the time it takes to pull (for receive) or dispatch (for deliver) from the message bus?

stebet avatar Oct 24 '23 15:10 stebet