opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[exporter/googlecloud] Correctly translate exception-events from OpenTelemetry to Google Trace format

Open Jaskaranbir opened this issue 3 years ago • 3 comments

The OpenTelemtry spec provides info on exception-handling that is based adding exception-info as Event: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md

Currently, the exception-events passthrough as-is without any extra processing, which leads to following representation in Trace UI: image Notice that the stacktrace above is incomplete and has been character-limited by GCP, which significantly reduces its usefulness.

Meanwhile, referencing the following screenshot from: https://cloud.google.com/trace/docs/trace-labels image The above screenshot includes a specially formatted section for displaying call-stack, which is extremely helpful.

So it would be great if the exceptions exported by OpenTelemetry could be formatted to Trace's conventions to allow displaying them in correct format as achieved by using native Google.Diagnostic libraries. This seems to require a rather tricky stack-trace parsing (especially considering different languages will have their own stack-trace representation), but is there a possibility to develop some conventions to improve the feasibility of this (such as expecting special tags within spans/traces with segments of pre-parsed stack-trace)?

Just in case, .NET Core is my primary development language here and it seems to have fairly under-developed libraries for exporting Otel data to GCP, which is one of the major reasons for using OpenTelemetry Collector.

Jaskaranbir avatar Aug 01 '22 22:08 Jaskaranbir

Pinging code owners: @aabmass @dashpole @jsuereth @punya @tbarker25 @damemi. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Aug 05 '22 17:08 github-actions[bot]

The description of /stacktrace in the docs you reference looks like it only applies to the v1 API, while we use the v2 API: JSON-formatted stack trace. Stack traces aren't indexed for search.This label is displayed in the Call Stack table.The /stacktrace label is only used by the V1 API.For the V2 API, the span contains a stackTrace field.

So it looks like we would actually want to support setting Span.StackTrace based on the attribute. Otherwise, this seems like it would be nice to have, but it looks like a hard problem to solve. We will probably keep it on the backlog for now, but we appreciate the feedback.

dashpole avatar Aug 05 '22 20:08 dashpole

Thanks for the response @dashpole. I see that StackTrace constitutes the StackFrame here: https://github.com/googleapis/go-genproto/blob/01dd62135a58/googleapis/devtools/cloudtrace/v2/trace.pb.go#L1226

With my limited context, is it still as complex if the client can be expected to handle transforming raw stacktrace into stackframes object above, and pass the structured response via attribute? Even a rudimentary implementation of this would be quite useful.

Jaskaranbir avatar Aug 05 '22 21:08 Jaskaranbir

I don't know if that would be a great contract for the exporter to have. It would be fairly difficult for clients to implement correctly, so I would prefer having something in the exporter. I'm wondering if starting with something as simple as "split the stacktrace into one attribute for each line" (e.g. exception.stacktrace.0, exception.stacktrace.1) would make this useable while we figure out how we want to handle this...

dashpole avatar Aug 08 '22 15:08 dashpole

So it'll display in the TraceUI as multiple tags for now? That sounds good enough as starting point to me.

Jaskaranbir avatar Aug 09 '22 16:08 Jaskaranbir

A teammate pointed me to https://cloud.google.com/trace/docs/quotas, which says we are limited to 32 attributes per-span. I'm worried that the multiple-attribute approach would hit that limit ~instantly, and just end up causing other attributes to be dropped...

dashpole avatar Aug 09 '22 18:08 dashpole

The limit does make sense.

Referring back to a previous comment about having a defined contract for stacktrace, would it be an option to have provisions, based on some configurable options of the exporter, for allowing users to either format their own stacktrace or allowing them to pass a raw stacktrace and attempting to format it?

Since stacktrace format isnt standardized across languages and tools, I think the latter of the above options would be quite difficult to achieve. Although we can still have a best-attempt basis implementation that tries to format a raw stacktrace. But allowing users to define their own call-stack/stacktrace formats will be easier to implement and more flexible. Then in the future, if the latter option works out, the former option (of expecting a pre-structure stacktrace) can be deprecated.

Note: The auto-formatting raw stacktrace doesnt have to be implemented right now. I am really just referring to structuring the code in a way that allows moving towards that goal in future, but start with easier implementation instead that still allows using Trace to its ideal extent.

Jaskaranbir avatar Aug 09 '22 19:08 Jaskaranbir

I could totally see that working for an SDK exporter (e.g. provide a string -> StackTrace function), but that sounds hard to do in the collector via yaml config.

dashpole avatar Aug 09 '22 19:08 dashpole

I was thinking about an implementation where the formatted stacktrace is structured and added as a tag containing JSON data. The JSON structure could even be what V1 API uses for stacktraces:

    "stackTrace": {
      "stackFrames": {
        "frame": [
          {
            "functionName": {
              "value": "serverMethodTrace [as func]"
            },
            "fileName": {
              "value":
              "/usr/src/app/node_modules/@google-cloud/trace-agent/build/src/plugins/plugin-grpc.js"
            },
            "lineNumber": "249",
            "columnNumber": "28"
          },
          {
            "functionName": {
              "value": "anonymous function"
            },
            "fileName": {
              "value": "/usr/src/app/node_modules/grpc/src/server.js"
            },
            "lineNumber": "592",
            "columnNumber": "13"
          }
        ]
      }
    }

So the exporter receives a defined tag with this JSON structure and uses it to build the V2-API call-stack. On the configuration side for the exporter, there's option to either parse the data from this tag as pre-formatted JSON-stacktrace or the raw-stacktrace (for now, it can only support one option, that of pre-formatted stacktrace).

What are your thoughts on something like this?

Jaskaranbir avatar Aug 10 '22 00:08 Jaskaranbir

It seems really hard for users to implement... I'll have to chew on it a bit.

Just for fun, can you try adding this in your exporter config:

trace:
    attribute_mappings:
    - key: exception.stacktrace
      replacement: /stacktrace

To see if the /stacktrace field still works in the v2 API?

dashpole avatar Aug 10 '22 19:08 dashpole

Sorry, was busy in last few days. Though I think some major languages already seem to include stacktrace parsers to an extent. For example, Go, Ruby, Dotnet, JavaScript (albit with external packages, but there are lots available). It'll just be about mapping an object to another.

And considering this is an alternative implementation that can be deprecated anytime and is easy enough to implement, this seems good enough as an optional drop-in until better implementation.

Will test out the /stacktrace field with V2 API when I get chance (likely sometime this weekend), thanks for the idea!

Jaskaranbir avatar Aug 12 '22 18:08 Jaskaranbir

Finally had the chance to try this out. Though adding the exception.stacktrace attribute didnt quite work, it just adds a /stacktrace label in UI with clipped stacktrace text (same result as issue description).

@dashpole any further thoughts on the intermediate workaround (https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/12885#issuecomment-1210018097)

Jaskaranbir avatar Aug 29 '22 00:08 Jaskaranbir

Not right now. If we did go that route it is likely something we would have to keep around (we wouldn't risk breaking users by deprecating it), so i'd like to give it more thought before moving forward.

dashpole avatar Aug 31 '22 16:08 dashpole

Sounds good, keeping my fingers crossed for this 🤞

Jaskaranbir avatar Sep 05 '22 18:09 Jaskaranbir

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • exporter/googlecloud: @aabmass @dashpole @jsuereth @punya @tbarker25 @damemi

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Nov 10 '22 03:11 github-actions[bot]

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • exporter/googlecloud: @aabmass @dashpole @jsuereth @punya @tbarker25 @damemi

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Jan 10 '23 03:01 github-actions[bot]

This issue has been closed as inactive because it has been stale for 120 days with no activity.

github-actions[bot] avatar May 27 '23 02:05 github-actions[bot]