opencensus-specs
opencensus-specs copied to clipboard
Proposal: define standard RPC tags
We should define a few standard tags that we believe will be generally useful and have proven to be so internally at Google.
These tags include:
-
originator
: the first RPC that triggered the chain of requests (trace) -
caller
: the immediate previous RPC -
method
: the current RPC
Some of the open questions:
- [ ] Should these be defined across gRPC, HTTP and maybe in future other RPC protocol or should we have separate tags per RPC protocol
- [ ] Do we need a general mechanism to implement these propagation semantics or should we just hard-code these special tags?
- [ ] Any other tags we should consider adding to this set?
- What is the value of
originator
andcaller
? This only works in systems that have a knowledge of "user" or "service" but does not work in all the systems, because we do not know how to set them. One option could be that the user has to specify them or set a callback on the rpc plugins for example. - We used to have
method
but this tag is not enough, maybe in a combination withprotocol_type
would work.
Also for reasons that are written here https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md#why-different-tag-name-for-serverclient-method we should make these tags incoming_method
and incoming_protocol_type
or server_method
and server_protocol_type
.
I would like to close this asap because I don't think is good to make too many changes that are so critical for the system.
Should these be defined across gRPC, HTTP and maybe in future other RPC protocol or should we have separate tags per RPC protocol
I think in this case we combine the 2 tags that I mentioned, this is a good solution but the downside is that users will have to deal with multiple tags in configs/views. The advantage is that they are very clear where they are added, the disadvantage is that we need special tags for each RPC system and maybe more complicated views when the application uses multiple transport systems.
Do we need a general mechanism to implement these propagation semantics or should we just hard-code these special tags?
If we do use one name hard-coding is good enough, otherwise we can handle this by using what we discuss to implement here https://github.com/census-instrumentation/opencensus-specs/issues/110
Any other tags we should consider adding to this set? I have some tags in my mind (maybe not all of them make sense):
-
protocol_type
ortransport_type
that I mentioned before -
status
may be an interesting tag to handle (but it is different because it propagates backwards, but a standard tag for that is probably good). -
caller_method
the incoming method on the caller.
Just to be clear, I'm not proposing that this replace any of the existing tags. I am not proposing to make any changes to the existing spec.
I think that this proposal is compatible with the existing tags because the existing tags aren't suitable to be propagated and this proposal deals mainly with tags that are propagated (besides "method", which I just added for completeness, we can discuss that separately). I think we need both.
Don't the doesn't the parent_span_id
of a trace equal the caller
? If we recurse up parent_span_id
, don't we get to the proposed originator
? What use of method
makes name
not already appropriate?
I'm not sure what semantics are being implied here that make these additions useful. It feels to me like the existing span structure already has these forms of links & information built in. I'd like to hear some more of why these aren't already sufficient.
@rektide OpenCensus handles both stats/metrics and tracing. These are really separate concerns for now. Tags (as contemplated in this issue) are only available in the stats part of OpenCensus and don't really have any connection to tracing.
Concretely for example, the parent_span_id
is not a useful value for caller
because you could not break down stats by this value. It would be meaningless and very high cardinality (so would not work in time series databases). Something like a service name or a user name would be a meaningful caller
tag (inside Google it is a user name but not sure where we would get this from so we might rather go with a service name).
Could these include a process id information perhaps? (e.g. host/pod name + pid)? These are useful when correlating with specific log events and investigating specific instance performance.
Additionally, one useful element in the opentracing/jaeger instrumentation I have used before is a standard "error" boolean tag, this allows visualisation to easily show spans with known error conditions.
sorry but error boolean is a terrible idea (error=false anyone)... I'd much rather see the absence or availability of an error key with a string payload like Zipkin which allows this visualisation too
@basvanbeek I don't think the idea is terrible. Also, let's avoid that kind of discouraging language please.
Having a error key with a string payload may work fine for a tracing system like Zipkin, but if we are talking about RPC tags which go into time series databases, high cardinality string errors are not an option.
I think there is already a low cardinality status tag anyway and trace.Span has a dedicated Status field.
@ramonza I've already had a bit of practice with some of the less positive tone (fortunately @kjin has been fantastic help over in the NodeJS repo, but for him I'd probably not have stuck around)
If the tags on span may end up propagating into metrics, a raw string tag would be painful.
If the low cardinality status tag is already reasonably standard, that may be enough (assuming a status != OK
style query is easy to support in tracers).