azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[BUG] Correlation between Producer and Consumer Traces is broken in Azure.Message.ServiceBus OpenTelemetry Support

Open rakesh-308 opened this issue 2 years ago • 4 comments

Library name and version

Azure.Message.ServiceBus 7.11.1

Describe the bug

I am posting a message to a service bus topic in one of my web api endpoint as below

[HttpPost]
public async Task<IActionResult> PostServiceBusMessage()
{
    _logger.LogTrace("{functionName} requested", nameof(PostServiceBusMessage));

   string connectionString = configuration.GetValue<string>("ServiceBusConnectionStringSend") ;
    string topicName = configuration.GetValue<string>("topicName");

    await using var client = new ServiceBusClient(connectionString);

    ServiceBusSender sender = client.CreateSender(topicName);

    var message = new ServiceBusMessage("Hello world!")
    {
        ScheduledEnqueueTime = DateTime.UtcNow,
        MessageId = Guid.NewGuid().ToString("D")
    };
    await sender.SendMessageAsync(message);
    return Ok();
}

A consumer class is defined in the same project inheriting the BackgroundService like

using System;
using System.Threading;
using System.Threading.Tasks;
using Azure.Messaging.ServiceBus;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;

namespace Tracing
{
    public class AzureServiceBusConsumer : BackgroundService
    {
        private readonly ServiceBusProcessor _processor;
        private readonly ILogger _logger;

        public AzureServiceBusConsumer(
            ILogger<AzureServiceBusConsumer> logger,
            IConfiguration configuration)
        {
            _logger = logger;
            _processor = new ServiceBusClient(configuration.GetValue<string>("ServiceBusConnectionStringListen"))
                .CreateProcessor(configuration.GetValue<string>("topicName"), configuration.GetValue<string>("subscriptionName"));
        }

        private async Task ProcessMessagesAsync(ProcessMessageEventArgs args)
        {
            var message = args.Message;
            if (message.ApplicationProperties.TryGetValue("Diagnostic-Id", out var objectId) && objectId is string diagnosticId)
            {
                Console.WriteLine($"----- Diagnostic-Id = {diagnosticId}");
            }
            _logger.LogInformation("Message found to process");
            await args.CompleteMessageAsync(message);
            return;
        }

        public override async Task StopAsync(CancellationToken cancellationToken)
        {
            await _processor.StopProcessingAsync(cancellationToken);
        }

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            _processor.ProcessMessageAsync += ProcessMessagesAsync;
            _processor.ProcessErrorAsync += ProcessErrorAsync;
            await _processor.StartProcessingAsync(stoppingToken);
        }

        private Task ProcessErrorAsync(ProcessErrorEventArgs args)
        {
            var message = args.Exception.Message;
            _logger.LogInformation(message);
            return Task.CompletedTask;
        }
    }
}

It is set up in Startup ConfigureServices(IServiceCollection services) as

services.AddHostedService<AzureServiceBusConsumer>();

When I send a message to service bus via invoking my web api endpoint, It is received by the consumer but totally with a different trace Id and the link between sender and consumer traces is totally broken.

image

TraceIds highlighted in yellow are within the webapi method where a message is sent to this service bus topic, ApplicationProperty "Diagnostic-Id" contains this information, BUT the TraceId in the consumer side is totally different (Underlined in red color)

Jaeger is not happy with this and does not show anything about the consumer side because of this totally different traceId. image

End to end Observability is broken in this scenario.

Expected behavior

Service bus Consumer should expose Activity same TraceId as of Producer.

Actual behavior

TraceIds are totally different between Producer and Consumer and this OpenTelemetry Instrumentation seems broken.

Reproduction Steps

It is reproducible every time with steps and sample code mentioned in description.

Environment

Service bus is running in Azure.

Windows 11

Microsoft Visual Studio Enterprise 2022 (64-bit) - Current Version 17.4.1

rakesh-308 avatar Jan 06 '23 10:01 rakesh-308

I forgot to mention that I have enabled the experimental open telemetry support as per https://devblogs.microsoft.com/azure-sdk/introducing-experimental-opentelemetry-support-in-the-azure-sdk-for-net/

rakesh-308 avatar Jan 06 '23 11:01 rakesh-308

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire avatar Jan 06 '23 14:01 jsquire

One initial thing to check - are you sure you are receiving the same message that you are sending in your repro?

JoshLove-msft avatar Jan 09 '23 23:01 JoshLove-msft

One initial thing to check - are you sure you are receiving the same message that you are sending in your repro?

Yes @JoshLove-msft ; I have cross checked it couple of times that there is no pending message in subscription and I am getting the same message as I sent on topic.

rakesh-308 avatar Jan 10 '23 08:01 rakesh-308

I can verify this issue. Tested with Azure.Messaging.ServiceBus 7.12.0

Workaround with new parent activity:

private async Task HandleMessage(ProcessMessageEventArgs arg)
{
    var parentId = arg.Message.ApplicationProperties["Diagnostic-Id"].ToString();
    using var activity = _activitySource.StartActivity(nameof(HandleMessage), ActivityKind.Consumer, parentId);
    // simmulate some acitivity
    await Task.Delay(50);
}

rohinz avatar Jan 17 '23 10:01 rohinz

Issue appears to be that in the ActivitySource path in DiagnosticScope, we are not using the overload for ActivitySource.Start that takes a parentId parameter. The issue appears when the parent Id is set on the DiagnosticScope before the scope is actually started, which is the case in the ServiceBus.Process span. When using SetParentId on an already started scope, it appears to work as expected. However, the fix should be to make DiagnosticScope correctly start the ActivitySource with the correct parent.

JoshLove-msft avatar Jan 26 '23 06:01 JoshLove-msft