azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[BUG] - Trace ParentId is not restored from Diagnostic-Id message property

Open lukasz-pyrzyk opened this issue 3 years ago • 9 comments
trafficstars

Library name and version

Azure.Messaging.ServiceBus 7.10.0

Describe the bug

I'm investigating W3C context propagation over ServiceBus. It looks like message sent correctly with valid Diagnostic-Id is not correlated correcly during the message read handling. For both servicebusreceiver.receive and servicebusprocessor.processmessage operations the ParentId property is null. Because of that, these operations are reported as separated traces instead of one trace, started on the web api request.

different parent ids

I tried to set ParentId manually with lines like:

    private async Task ProcessorOnProcessMessageAsync(ProcessMessageEventArgs arg)
    {
        var bodyAsString = arg.Message.Body.ToString();
        var diagnosticsId = arg.Message.ApplicationProperties["Diagnostic-Id"].ToString();
        using var activity = new Activity("ServiceBusProcessor.processing");
        activity.SetParentId(diagnosticsId);
        activity.Start();

        [...]
        activity.Stop();
    }

but it moves only user-code to the correct trace. Both servicebusreceiver.receive and servicebusprocessor.processmessage operations are received outside of the flow image

Expected behavior

ServiceBus reader (both processor and receiver) should use TraceId from Diagnostic-Id as a ParentTraceId

Actual behavior

servicebusreceiver.receive and servicebusprocessor.processmessage don't read ParentTraceId

Reproduction Steps

Available at https://github.com/lukasz-pyrzyk/opentelemetry-demo, API and webjob project, together with HTTP request to start the flow.

Environment

No response

lukasz-pyrzyk avatar Sep 19 '22 20:09 lukasz-pyrzyk

Hi @lukasz-pyrzyk. Thank you for reaching out and we regret that you're experiencing difficulties. The decision to not use the ParentId is intentional, as the library uses links to connect the individual messages activities with the parent, as per the OpenTelemetry messaging spec.

Further discussion can be found in #29907.

I'm going to mark this as addressed, but please feel free to unresolve if you'd like to continue the discussion.

jsquire avatar Sep 19 '22 20:09 jsquire

//cc: @lmolkova, for any additional thoughts.

jsquire avatar Sep 19 '22 20:09 jsquire

Hi @lukasz-pyrzyk. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost avatar Sep 19 '22 20:09 ghost

Hi @jsquire and thank you for the quick reply.

How Links functionality corresponds to the https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-end-to-end-tracing?tabs=net-standard-sdk-2#trace-message-processing ? That code sample gave me the idea to link spans with ParentTraceId, but now it looks like it is not recommended way, because it is done automatically with Links, so I think i got confused with it.

After reading the sentence If you use ProcessMessageAsync of ServiceBusProcessor (message handler pattern) to process messages, the message processing is also instrumented. All Service Bus calls done by your service are automatically tracked and correlated with other telemetry items. I thought that SDK has a bug, because ParentId is not populated automatically.

Is there a chance that the article comes from the times of Microsoft.Azure.Servicebus, where this approach was used?

lukasz-pyrzyk avatar Sep 19 '22 21:09 lukasz-pyrzyk

Is there a chance that the article comes from the times of Microsoft.Azure.Servicebus, where this approach was used

I don't know the origin of the article nor who maintains it, but the code samples appear to have been updated for the current generation and I notice that our telemetry expert is listed as a contributor. @lmolkova: Would you be so kind as to offer your thoughts?

jsquire avatar Sep 20 '22 13:09 jsquire

@lukasz-pyrzyk thanks for reaching out!

We do correlation for messaging scenarios using links, i.e. processor span/activity does not become a child of producer activity, but we add a link to it. We also fixed some issues related to receive and complete/defer/etc in the last 7.11.0-beta.1 - they also use links.

The reason for it is that we don't really know what Activity is current at the moment we receive/settle/process spans. I.e. we want them to be children of your current context. By using links we don't override it, but the downside is that not all backends support links.

As a result, if your backend supports links, you'd see something like this: MicrosoftTeams-image

I don't think we can change this behavior for receive and settle calls, but I believe we can change it for process, where there is no chance for user context to appear before message processing starts.

I believe we can do it in non-breaking way, i.e. keep the link for now and also use the same context as a parent. I saw similar concerns from users in Java and we're starting to set parent instead of a link for processor spans.

lmolkova avatar Sep 20 '22 16:09 lmolkova

Hi @lmolkova and thanks for the quick reply!

I've updated to 7.11.0-beta.1 and unfortunately, on my backend I don't see a link, tried Zipkin and Jaeger (I read somewhere that Jaeger supports it, but unfortunately without a luck for me).

I think adding this behavior for process functionality would be a great improvement. It would simplify the many investigations and answer many questions that come from lack of that. Personally, I thought about this as a bug, because in previously mentioned article setting parentTraceId is done manually. In my PoC from two years ago, I've implemented flow tracing for servicebus manually to achieve this behavior. Now, with official support for the Messaging, I expected to have it out of the box. If it is possible without doing a breaking change, I would vote for it!

lukasz-pyrzyk avatar Sep 20 '22 20:09 lukasz-pyrzyk

Thanks for trying @lukasz-pyrzyk

Jaeger supports it, but unfortunately without a luck for me

They're called 'references' in Jaeger and if you find a process span there and expand it, you should see a list of references, they will point to the message. It only works in one direction, so finding the process span from the message is not possible in Jaeger.

lmolkova avatar Sep 20 '22 21:09 lmolkova

Ok, I got it. I found FOLLOWS_FROM in the references section for ServiceBusReceiver.Complete and ServiceBusProcessor.ProcessMessage actions. I wasn't able to find it on the UI, maybe because link is to the trace from a different application (and it seems like Jaeger filters traces by specific service).

Thank you for great the explanation @lmolkova . I think we can close the issue. I hope that connecting parentTraceId to the Process span will be implemented in the future :)

lukasz-pyrzyk avatar Sep 20 '22 21:09 lukasz-pyrzyk

Hi @lukasz-pyrzyk, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.

ghost avatar Sep 27 '22 22:09 ghost