NServiceBus.Extensions.Diagnostics
NServiceBus.Extensions.Diagnostics copied to clipboard
Exception Handling
I am unable to find any mention of exception handling using the Activity API. I am curious about your thoughts on this. As it stands, Incoming/Outgoing Message failures are not captured and are therefore unavailable to downstream telemetry.
https://github.com/jbogard/NServiceBus.Extensions.Diagnostics/blob/b7ef20b5f579809c5ed69ac7518c9d1ddae89884/src/NServiceBus.Extensions.Diagnostics/OutgoingPhysicalMessageDiagnostics.cs#L21-L37
Something like this would allow the Activity to propagate the Exception (admittedly in an awkward way):
using (var activity = StartActivity(context))
{
if (activity != null)
{
InjectHeaders(activity, context);
}
try
{
await next().ConfigureAwait(false);
}
catch(Exception ex)
{
activity.SetCustomProperty("Exception", ex);
}
finally
{
if (_diagnosticListener.IsEnabled(EventName))
{
_diagnosticListener.Write(EventName, context);
}
}
}
Have you thought about this? Is there a better way?
I guess you'd want to re-throw the exceptions recoverability can kick in.
Yeah I think the problem is there's no easy way to detect when a message is sent to the error queue. It's just not exposed today. I can do the exception but I think the better route would be to hook directly into the NSB error processing, which is also not exposed today.
And it is not feasible to do something with this kind of hook in the Feature.Setup method?
var recoverability = context.Settings.Get<RecoverabilitySettings>();
recoverability.Failed(settings => settings.OnMessageSentToErrorQueue(message =>
{
System.Diagnostics.Activity.Current?.SetCustomProperty("exception", message.Exception);
return Task.CompletedTask;
}));
Also setting the Activity Status is important for Application Insights Telemetry.
activity.SetStatus(ActivityStatusCode.Ok);
activity.SetStatus(ActivityStatusCode.Error, "error description"); // exception stacktrace as description?
"Yeah I think the problem is there's no easy way to detect when a message is sent to the error queue." I don't think this should be definition of an error. I don't care if the message is being retried or ends op in the error queue: it failed and I want to see that :-).
I want to see the retries in the Telemetry: I think this is valuable information to show (it helped me multiple times to identify issues). The only downside is that Exceptions are overly reported, because of the retries, but I prefer this, then to hide them.
I don't think I saw that failure config, this should be easy enough to add.
From my tests, just setting the status on the activity doesn't propagate that information into the exporter tool.
To make that work, you need to set specific Otel tags:
activity?.SetTag("otel.status_code", "ERROR");
activity?.SetTag("otel.status_description", ex.Message);
Reading this issue here https://github.com/open-telemetry/opentelemetry-dotnet/issues/2569 , its not being doing on purpose. It's saying it's the responsibility of the exporter(s) to generate those tags.