zipkin-gcp icon indicating copy to clipboard operation
zipkin-gcp copied to clipboard

Error label & SpanKind

Open fzakaria opened this issue 4 years ago • 10 comments

I'm seeking guidance on the best way to find associated mappings for SpanKind & errors.

SpanKind

The current codebase has the following snippet:

// Add Kind as a tag for now since there is no structured way of sending it with Stackdriver
// Trace API V2
if (zipkinSpan.kind() != null) {
  attributes.putAttributeMap(kKindLabelKey, toAttributeValue(kindLabel(zipkinSpan.kind())));
}

According to the documentation there seems to be structured field for reporting the kind. https://cloud.google.com/trace/docs/reference/v2/rest/v2/SpanKind

Does this matter for the purpose of the CloudTrace UI?

Error

I believe there to be a semantic gap between Zipkin's, specifically Brave's, concept of an error tag and the corresponding error value in CloudTrace.

It is conventional in Zipkin to add a possibly empty "error" tag when a span failed. In Brave the backfilling of this tag with an Exception type name (or message) is deferred until it is known to be Zipkin2 format.

CloudTrace according to their documentation supports the following:

/error/message: An error message. /error/name: Display name for the error. (https://cloud.google.com/trace/docs/trace-labels)

StackTrace https://cloud.google.com/trace/docs/reference/v2/rest/v2/StackTrace

It would be great if the AttributesExtractor could translate the error tag accordingly.

From my investigation of the code, it seems like the Span translation happens from Zipkin2 format rather than brave's Span which may unfortunate since the Throwable information is then lost. (Would possibly allow creation of the StackTrace attribute)

fzakaria avatar May 14 '20 03:05 fzakaria

Sorry we missed this, @fzakaria. Have you been able to get error messages into Trace?

elefeint avatar Oct 06 '20 14:10 elefeint

I don't think so although I've lost context on it. I've been very happy over all with the GCP sender so this was more of a cherry on top.

fzakaria avatar Oct 06 '20 14:10 fzakaria

Thank you!

elefeint avatar Oct 06 '20 14:10 elefeint

for future readers I still believe it to be a gap using GCPSender but achieving 100% is likely out of scope for this as a compatability layer, that case better suited with using a library more native to CloudTrace

fzakaria avatar Oct 06 '20 15:10 fzakaria

That's fair -- if this requirement comes up again, this issue should be reopened.

elefeint avatar Oct 06 '20 15:10 elefeint

FWIW the throwable isn't lost in Brave. It is just that this implementation loses it. There's a SpanHandler in brave made specifically for the case of parsing throwables.

That no one yet has implemented SpanHandler here is a side-effect of a volunteer-based project. No one is actively doing things unless they feel like it. That doesn't mean it can't be done, but I agree "sender" is not the way out.

I'm re-opening as this issue isn't solved and the data model is a generic, not a personal requirement.

codefromthecrypt avatar Oct 06 '20 23:10 codefromthecrypt

there are two issues here, btw.

The SpanKind maps exactly to Brave, this can be fixed The error labels do not indicate they must be a Throwable, so this is a bit misclassified.

The service is not just java anyway, and literally GRPC examples are used. I recommend simply falling back to setting both to the same value as "error" when non-empty and other tags don't exist https://cloud.google.com/trace/docs/trace-labels

Later, when a SpanHandler is implemented, a Throwable can be considered when parsing (along with the "error" tag) http://zipkin.io/brave/5.12.6/brave/brave/handler/SpanHandler.html

codefromthecrypt avatar Oct 07 '20 00:10 codefromthecrypt

Some other things that can be done..

based on the description of error, one of them can be the RPC status, which we have a field for

"rpc.error_code", eg "CANCELLED" "error" the RPC error code if there is no exception

https://github.com/openzipkin/brave/tree/master/instrumentation/rpc#span-data-policy

Almost the same on http

"http.status_code" when the status is not success. "error", when there is an exception or status is >=400

https://github.com/openzipkin/brave/tree/master/instrumentation/http#span-data-policy

Anyone free to fix this one? looks easy peasy

codefromthecrypt avatar Oct 07 '20 00:10 codefromthecrypt

The missing compatibility for kind does seem trivial and maybe should be broken into a separate issue. It seems pretty straightforward.

I think you lost me a little with the error codes; I am a bit out of touch with the codebase as I am newly back from paternity leave.

I recommend simply falling back to setting both to the same value as "error" when non-empty and other tags don't exist

I think you mean: "I recommend simply falling back and to setting /error/message and /error/name to the Span's error value when non-empty.

If so, that seems like a straightforward change for the attribute extractor as well.

based on the description of error, one of them can be the RPC status, which we have a field for

The values look to be the same for "error" and either "http.status_code" & "rpc.error_code". I'm not clear on your recommendation that one of the CloudTrace tags (/error/message or /error/name) can that value.

fzakaria avatar Oct 07 '20 02:10 fzakaria

Thanks @fzakaria. honestly the mapping isn't that tight it says literally:

/error/message | An error message. "Rendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = OS Error." /error/name | Display name for the error.

So, my guess is that this is a long name (/error/message) short name (/error/name) thing until someone says otherwise.

In that case...

/error/message should be span.tags[error] if not empty /error/name should be prioritize span.tags[rpc.error_code], span.tags[http.status_code], then span.tags[error] if not empty

sometimes having some rules is better than none, as at least it can get feedback and it is better than doing nothing!

codefromthecrypt avatar Oct 07 '20 10:10 codefromthecrypt