opentelemetry-erlang icon indicating copy to clipboard operation
opentelemetry-erlang copied to clipboard

Add record_exceptions options to with_span

Open albertored opened this issue 1 year ago • 9 comments

Fixes #236

The Python API has been taken as a reference.

There is still something that does not fully convince me.

Elixir exceptions are recorded with exception.type equal to the name of the exception struct (see test).

Standard erlang errors are mapped by Elixir to exception structs so when in Elixir, inside a trace, an exception is raised with :erlang.error(type) the exception is recorded with the erlang type (e.g. error:badarg) but the user get an ArgumentError (see test).

albertored avatar Sep 01 '23 10:09 albertored

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Files Coverage Δ
apps/opentelemetry/src/otel_tracer_default.erl 94.73% <90.00%> (-5.27%) :arrow_down:
apps/opentelemetry_api/src/otel_tracer.erl 69.56% <50.00%> (ø)
apps/opentelemetry_api/lib/open_telemetry/span.ex 22.72% <0.00%> (-2.28%) :arrow_down:
apps/opentelemetry_api/src/otel_span.erl 72.63% <78.78%> (-0.23%) :arrow_down:

... and 1 file with indirect coverage changes

:loudspeaker: Thoughts on this report? Let us know!.

codecov[bot] avatar Sep 01 '23 10:09 codecov[bot]

Sorry for the delay. I'm not sure what to do about badarg vs ArgumentError either. Its probably best if it could be ArgumentError. Is that possible to do?

tsloughter avatar Sep 30 '23 14:09 tsloughter

The only ways I can think of are:

  • duplicating the implementation of the feature on Elixir side
  • somehow add a metadata when using the Elixir version of with_span so that in elrang code we can handle this situation

I'm usually against code duplication but in this case I'm not sure the complexity needed for avoiding duplication is worth.

albertored avatar Oct 02 '23 07:10 albertored

The problem with duplication is the logic is in the SDK and there is only the Erlang SDK. We'd have to have the logic in the API to add it to Elixir.

Hm, couldn't this just convert Erlang atoms like badarg to ArgumentError?

tsloughter avatar Oct 02 '23 09:10 tsloughter

The problem is that in both these cases

 %% ERROR
    ?assertException(error, badarg, otel_tracer:with_span(Tracer, <<"span-error">>, #{record_exception => true},
                                               fun(_SpanCtx) ->
                                                erlang:error(badarg)
                                               end)),

    receive
        {span, SpanError} ->
            ?assertEqual(<<"span-error">>, SpanError#span.name),
            ?assertEqual(undefined, SpanError#span.status),
            [#event{name=exception, attributes=A}] = otel_events:list(SpanError#span.events),
            ?assertMatch(#{'exception.type' := <<"error:badarg">>, 'exception.stacktrace' := _}, otel_attributes:map(A))

    after
        1000 ->
            ct:fail(timeout)
    end,

and

assert_raise ArgumentError, fn ->
        Tracer.with_span "span-1", record_exception: true do
          :erlang.error(:badarg)
        end
      end

      assert_receive {:span,
                      span(
                        name: "span-1",
                        events: {:events, _, _, _, _, [event]},
                        status: :undefined
                      )}

      assert event(name: :exception, attributes: {:attributes, _, _, _, received_attirbutes}) =
               event

      assert %{
               "exception.type": "error:badarg",
               "exception.stacktrace": _
             } = received_attirbutes

the with_span function receives a error:badarg and it needs to know if it should be kept as is in exception.type or if it should convert it to ArgumentError

albertored avatar Oct 02 '23 09:10 albertored

Yea, good point. I don't know what to do here. Maybe it has to stay as badarg.

@bryannaegele any thoughts or should we merge this?

tsloughter avatar Oct 19 '23 10:10 tsloughter

I don't know if it's 1:1 correlation to here, but Phoenix does normalization of erlang exceptions to something elixir can handle.

https://github.com/open-telemetry/opentelemetry-erlang-contrib/blob/main/instrumentation/opentelemetry_phoenix/lib/opentelemetry_phoenix/reason.ex

bryannaegele avatar Nov 04 '23 18:11 bryannaegele

There is something similar also in Elixir core, where Erlang exception are translated to Elixir ones.

This is exactly what is causing problems here: the logic for recording the exception is in the SDK (Erlang side) and we need to know if the with_span is called from Erlang or Elixir in order to decide whether the exception should be translated to Elixir before being recorded.

The only way for doing so that I can think of is passing an argument to the with_span macro

albertored avatar Nov 04 '23 21:11 albertored

I may have found a solution.

I take the first element of the stacktrace and use it to know if the exception was raised from Erlang or Elixir so that I can decide if the exception need to be translated or not.

In this way we also simplify the custom record_exception that was defined on Elixir API because know the SDK is able to handle it without customizations on top. A change on this is that know exception.type for Elixir exception is missing the Elixir. prefix, if it is a breaking change I can add it again

albertored avatar Nov 05 '23 15:11 albertored