opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

SetErrorStatusOnException with async Exceptions works incorrectly

Open QBens opened this issue 3 years ago • 7 comments

Bug Report

List of nugets

  • OpenTelemetry 1.1.0
  • OpenTelemetry.Exporter.Console 1.1.0

Runtime version:

  • net6.0

Symptom

When SetErrorStatusOnException is enabled and exception occurred and was caught in nested async methed, the outer activities are incorrectly tagged ar Error.

What is the expected behavior?

When all exceptions were caught in async methods, outer activities should not be tagged ar Error.

What is the actual behavior?

Currently when exception was thrown and caught in async method, outer activities are tagged as Error.

Reproduce

Code to reproduce the problem:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;

public class Program
{
    private static readonly ActivitySource MyActivitySource = new ActivitySource(
        "MyCompany.MyProduct.MyLibrary");

    public static async Task Main()
    {
        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddSource("MyCompany.MyProduct.MyLibrary")
            .SetSampler(new AlwaysOnSampler())
            .SetErrorStatusOnException()
            .AddConsoleExporter()
            .Build();

        try
        {
            using (var activity1 = MyActivitySource.StartActivity("Foo"))
            {
                using (var activity2 = MyActivitySource.StartActivity("Bar"))
                {
                    await Bar();
                    // no exception here
                }
            }
        }
        catch (Exception)
        {
            // swallow the exception
        }
    }

    private static async Task Bar()
    {
        await Task.Delay(100);
        try
        {
            await Bar2();
        }
        catch
        {
            // do nothing
        }
    }

    private static async Task Bar2()
    {
        await Task.Delay(100);
        throw new Exception("Oops!");
    }
}

Current result:

Activity.Id:          00-a74738702a69ac229573668ac075e01c-456b2d43151b21aa-01
Activity.ParentId:    00-a74738702a69ac229573668ac075e01c-bb28a8612b992338-01
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: Bar
Activity.Kind:        Internal
Activity.StartTime:   2021-12-17T10:18:34.8695981Z
Activity.Duration:    00:00:00.2853080
Activity.TagObjects:
    otel.status_code: ERROR
Resource associated with Activity:
    service.name: unknown_service:OpenTelemetryPoligon

Activity.Id:          00-a74738702a69ac229573668ac075e01c-bb28a8612b992338-01
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: Foo
Activity.Kind:        Internal
Activity.StartTime:   2021-12-17T10:18:34.8683046Z
Activity.Duration:    00:00:00.3186132
Activity.TagObjects:
    otel.status_code: ERROR
Resource associated with Activity:
    service.name: unknown_service:OpenTelemetryPoligon

Both activites are tagged with otel.status_code: ERROR

QBens avatar Dec 17 '21 10:12 QBens

I also hit this today. Are there eyes on this?

nathansoz avatar Mar 15 '22 00:03 nathansoz

Bump

nathansoz avatar Jun 24 '22 06:06 nathansoz

I spent some time looking at this today and was not able to reproduce the issue with the provided example.

I tested a few other combinations inside nested activities:

  • exception - both activities have tag.
  • exception within try/catch - no tag.
  • async exception - both activities have tag.
  • async exception within try/catch (provided example) - no tag.

I went looking for the specification on "otel.status_code" and found the definition of "ERROR": https://github.com/open-telemetry/opentelemetry-specification/blob/main/semantic_conventions/trace/exporter/exporter.yaml

id: error value: ERROR brief: 'The operation contains an error.'

Based on this, I think the current version of the SDK is working as intended. That is, if the inner activity contains an error, the outer activity also contains an error.

TimothyMothra avatar Nov 30 '22 00:11 TimothyMothra

The reproduction in the original issue worked for me. Here's a variant of it:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;

public class Program
{
    private static readonly ActivitySource MyActivitySource = new ActivitySource(
        "MyCompany.MyProduct.MyLibrary");

    public static async Task Main()
    {
        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddSource("MyCompany.MyProduct.MyLibrary")
            .SetSampler(new AlwaysOnSampler())
            .SetErrorStatusOnException()
            .AddConsoleExporter()
            .Build();


        using (var activity1 = MyActivitySource.StartActivity("OutsideBar"))
        {
           
                await Bar();
                // no exception here
        }

    }

    private static async Task Bar()
    {
        await Task.Delay(100);
        try
        {
            using (var activity2 = MyActivitySource.StartActivity("InsideBar"))
            {
                await Bar2();
            }
        }
        catch
        {
            // do nothing
        }
    }

    private static async Task Bar2()
    {
        await Task.Delay(100);
        throw new Exception("Oops!");
    }
}

Gives me these outputs:

Activity.TraceId:          0155f5a9391759baeb40766f955631a5
Activity.SpanId:           b8a0ee2d794ad634
Activity.TraceFlags:           Recorded
Activity.ParentSpanId:    04f4e22a7159253a
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: InsideBar
Activity.Kind:        Internal
Activity.StartTime:   2023-01-30T15:47:04.4927410Z
Activity.Duration:    00:00:00.1042580
Activity.Tags:
   StatusCode : ERROR
Resource associated with Activity:
    service.name: unknown_service:otelrepro

Activity.TraceId:          0155f5a9391759baeb40766f955631a5
Activity.SpanId:           04f4e22a7159253a
Activity.TraceFlags:           Recorded
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: OutsideBar
Activity.Kind:        Internal
Activity.StartTime:   2023-01-30T15:47:04.3857490Z
Activity.Duration:    00:00:00.2627850
Activity.Tags:
   StatusCode : ERROR
Resource associated with Activity:
    service.name: unknown_service:otelrepro

I'd expect InsideBar to have an error status code, but not OutsideBar. If I remove all the async/awaits and make everything sync, then it works as expected (InsideBar has an error, OutsideBar doesn't)

dotnet 7.0 with these OTEL versions:

  <ItemGroup>
    <PackageReference Include="OpenTelemetry" Version="1.3.2" />
    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.3.2" />
  </ItemGroup>

BenHewins avatar Jan 30 '23 15:01 BenHewins

Fairly sure I've just hit this problem in our code. Unfortunately this setting isn't useful to us without fixing this issue - if an exception is caught I don't want higher up activities being marked as errors.

olane avatar Jul 17 '23 13:07 olane