opentracing-java
opentracing-java copied to clipboard
Should there be such a thing as a `null` activeSpan()? Or SpanContext?
Now that #115 is in, we need to contend with the scenario where ActiveSpanSource.activeSpan()
/ Tracer.activeSpan()
returns null
.
As I see it, there are three options.
- Do nothing. Callers check for
null
and get on with their lives. - Specify that implementations use some sort of
NoopSpan
, but do not specify which - Specify that implementations use a particular instance of a particular object... maybe
NoSpan.INSTANCE
instead ofNoopSpan.INSTANCE
. I.e., the caller could do equality testing and be guaranteed to check for the "no-span Span" without adding a newisReal()
sort of method to theSpan
interface. Of course the various tagging methods would do nothing here.
There is the related case of Tracer.extract()
when there's, well, nothing to extract. Should it return null, an unspecified non-null SpanContext
, or a particular SpanContext instance to represent the case of being absent?
See also https://github.com/opentracing/opentracing-java/pull/115/files/bf4606c596955c7e353cfa92bba372d5ecd2e206#r114108185 and https://github.com/opentracing/opentracing-java/pull/115/files/bf4606c596955c7e353cfa92bba372d5ecd2e206#r109821172 where this came up. And also https://github.com/opentracing/opentracing-java/pull/111#discussion_r111682133 .
(cc @tedsuo @objectiser @ivantopo @felixbarny @mabn @sjoerdtalsma @yurishkuro @cwe1ss @pavolloffay @wu-sheng)
Having described the problem, my vote is to do nothing for now. My experience has been that tags are almost always added right when a Span is started or right before it's finished (or, now, deactivated)... in either of those cases, there needs to be a Span instance on hand.
For logging adapters, the null check doesn't bother me at all.... it's isolated to a single call site.
The case of ad hoc / opportunistic logging to an ActiveSpan
is the most troubling one to me, but I kinda wanted to wait until we've actually seen that pattern and problem crop up for real use cases before designing new API surface area around it.
That said, happy to hear other opinions.
Agree, option 1 is fine for now.
It's pity that we cannot use Java8. Then Optional
could be used.
My concern with NoSpan
is that it can lead to data loss. In the most situations if there is no active span new span will be created. Therefore a check is required or method like ~Tracer.activeOrNewSpan()
(wow it looks bad).
I opened https://github.com/opentracing/opentracing-java/issues/133 because of this. But this is a different case - there are two:
- I want to make sure that given thing is recorded, create span if it doesn't exist yet
- I want to record something, but if only there is an active span
My experience has been that tags are almost always added right when a Span is started or right before it's finished
But this holds only true for "agent" code i.e. code which manages the life cycle of a span and adds standard tags. This code might actually be in a agent library and thus is no end-user facing code. Null checks would be ok in this code. However, adding tags to a span is something end-user facing. Therefore the API should be designed in a way that there are as few gotchas for the user as possible and that the code he does not has to write a lot of boiler plate.
There are two or three things to discuss here:
- Is ActiveSpanSource.activeSpan() end-user facing?
- If yes, how should we change the API to avoid pitfalls and boiler plate for the end user?
- If not, do we need to change the API?
1) Is ActiveSpanSource.activeSpan() end-user facing?
Citing myself from a previous discussion on a use case that involves adding tags in end-user code:
The use case that I have in mind is when ot is used by an agent, users should still be able to add custom tags. For example, I've implemented a search analytics Kibana dashboard on top of stagemonitor and OT for a customer. It works by adding meta info about the search via tags like no of hits and the search term. For these kinds of things it would be nice if null checks could be avoided.
In the stagemonitor mailing list a user asked how to get performance stats per tenant in his multi tenant application: https://github.com/stagemonitor/stagemonitor-mailinglist/issues/21#issuecomment-300488083. My suggestion was to add a tenant
tag. Thinking about it, a baggage item could be appropriate too...
2) If yes, how should we change the API to avoid pitfalls and boiler plate for the end user?
Currently, adding a tag to the current span would look like this:
ActiveSpan span = GlobalTracer.get().activeSpan()
if (span != null) {
span.setTag("tenant", tenant);
}
I think this is error prone as the user could forget the null check (-> NPE) and requires them to write more code than necessary.
From a users perspective, it would be nicer if activeSpan()
would return a Noop Span in case there is no active span.
GlobalTracer.get().activeSpan().setTag("tenant", tenant);
If we want something like that, there are some things to consider:
a) Should we rename the activeSpan
method to incorporate that this method might return a noop?
b) Are there use cases where we need to know wether there actually is a active span or not? Are these use cases end-user facing or "agent-code-facing"?
c) If yes, how should we design the API accordingly?
Two quick suggestions:
- Add
ActiveSpanSource#hasActiveSpan()
: Assuming that mostly agent code has to check for this case, it would make sense to add this little inconvenience (checking forActiveSpanSource#hasActiveSpan()
instead of checking for activeSpan() != null) for "power users" instead of having a pitfall for end users (forgetting null check -> npe) - Add
Span.isSampled()
: This does not have quite the same semantics but in the end, might just what is needed. Is it really important to be able to distinguish the situations where there is no active span vs there is a active span but it is not sampled? Maybe in some cases this could break Span wrappers?
3) If not, do we need to change the API?
Probably not.
@felixbarny Why do you concern for "agent-code-facing"
? IMO, whether the API style is in agent, it is irrelevant with OT-java API.
OpenTracing-Java faces the end-user for tracing system, and the agent codes are running in the shadow. Users don't know and almost don't care.
For @bhs question, I think Option-1 will be enough for now. Because even returning the NoopSpan
, as an end user, I have to check, by instanceof
or something else. This is for better performance, no span means no need to tag, also means no need to obtain the tag values.
With agent code I mean code which manages the life span of a span and adds standard tags. No matter how it actually is implemented (javaagent, servlet filter, ...)
With end-user I mean someone who is using some OT-based agent for example.
Users don't know and almost don't care.
End users might still want to add custom tags which are specific to their application and thus can't be added by an agent. I've given two examples of that in 1)
I have to check, by instanceof or something else. This is for better performance, no span means no need to tag, also means no need to obtain the tag values.
Ok, performance is one use case where I need to know if it is a real span or not. This could be handled with both of my suggestions about adding ActiveSpanSource#hasActiveSpan()
and Span#isSampled()
.
But the performance is only one consideration. My argument about the potential NPE pitfall still holds true. Also, when there is no performance overhead in obtaining the tag value, a Noop Span could save some typing (no need to add a null check).
If people want to use OT-based agent, and also want to add some custom tags, sure, the agent must provide some bridge mechanism to make this happened. Still I am sure people can't use the agent API directly in their application. That is my argument. Like sky-walking APM, we provide application toolkit as a bridge to make this works. Also, this is not important for this discussion.
This could be handled with both of my suggestions about adding ActiveSpanSource#hasActiveSpan() and Span#isSampled().
Yes, they can. My choice is based on my coding habit, check null
is not bad. Even by using these two APIs, we also need to get activeSpan
first, and then check it within a if
statement. No difference between if (span != null)
and if(ActiveSpanSource#hasActiveSpan())
.
So I vote for 1)
Still I am sure people can't use the agent API directly in their application.
IMO, the API should be the OT-API so the user is not locked in to a specific vendor which is the purpose of OT.
No difference between if (span != null) and if(ActiveSpanSource#hasActiveSpan())
True, but there is a difference between
ActiveSpan span = GlobalTracer.get().activeSpan()
if (span != null) {
span.setTag("tenant", tenant);
}
and
GlobalTracer.get().activeSpan().setTag("tenant", tenant);
When there are no performance considerations to get the tenant, the second version is more concise and less error prone.
Another suggestion would be to add an additional method to ActiveSpanSource
:
Optional<ActiveSpan> activeSpanOptional();
I'm aware (and in favor) of the Java 6 compatibility of OT-API. This suggestion would actually not break anything. When using Java 8+ you can simply call GlobalTracer.get().activeSpanOptional().ifPresent(span -> span.setTag("tenant", tenant));
. When using Java 6 for example, you can't call the activeSpanOptional()
method as it would throw a compilation error but you can still use the activeSpan()
method.
ActiveSpanSource:Optional<ActiveSpan> activeSpanOptional();
This is good, only the Java level as you mentioned. Is this really possible to build lib like you said? You can't use the java-8 lib in Java-6.
Yes, that is possible. You can still set source and target version to 1.6 as there are no new language features used. But you'll have to compile the code with Java 8 so that the Optional class is in the compile path. This is btw. how Spring supported Java 8 features in 4.x while still maintaining Java 6 compatibility.
The only downside I know of is that if you call something like this ActiveSpanSource.class.getMethods()
, if would throw an exception when not using Java 8+.
Good to know, but seem a semi-compatibility way. Not so sure it's good in API level.
I'm also unsure if this is the right approach for OT but wanted to mention it.
@felixbarny Thanks.
@felixbarny My idea is to instead of this:
GlobalTracer.get().activeSpanOptional().ifPresent(span -> span.setTag("tenant", tenant));
use this (with the same semantic):
GlobalTracer.get().withActiveSpan(span -> span.setTag("tenant", tenant));
It does not depend on java 8 and is concise (when using lambdas).
@mabn I like this idea (I think you mentioned it on the old globaltracer project before). I assume to work with java < 8, we need to define a single-op 'function' interface for the activespan action.
Also the naming of with may be confusing? People may assume it will somehow activate some span if there is no active span? Maybe onActiveSpan(...) or ifActiveSpan(...) may be less confusing as it doesn't overload the with that's often used on builders.
Naming aside, I'm :+1: for this idea for its compatibility.
Sounds good to me. Which interface would be used for the lambda?
A new one, there's no Consumer<X>
before Java 8.
GlobalTracer.get().activeSpanOptional().ifPresent(span -> span.setTag("tenant", tenant));
+1 on this.
A new one, there's no Consumer<X> before Java 8.
True, but we could define a custom functional interface just for the purpose of being used in withActiveSpan
.
(very sorry for falling off the map PR/Issue-wise)
I've fully caught up on this.
It's still my opinion that in almost all scenarios where one is setting tags, one is also starting a new Span/ActiveSpan. As such, I don't mind the activeSpan()
null
checks since tpically there will be a guaranteed-not-null ActiveSpan
on hand when tags are being set.
That said, @mabn's suggestion seems like the right pattern if we're going to pursue something like this. IMO we should still wait until there's an unambiguous need for this convenience API, and not just a "would-be-nice" desire.
I see how the use case of adding tags to a Span might be the most common for library/agent creators and it usually happens when creating a new Span as @bhs mentioned and there is no need to null-check, but it seems to me like some other common use cases are not being taken into account on this discussion:
- Add/Get baggage items from/to the ActiveSpan, that's something of great value to end users and seems unlikely to happen on library/agent code, but with the current implementation the end user would need to null-check before adding baggage.
- Adding small hacks to properly trace non-instrumented libraries or application code. This just happened to me. I needed to capture a Continuation that will be later activated in another Thread due to how certain specific app is implemented and started to have NPEs because some calls to this code come from non-instrumented parts of the app. A null-check had to be added.
- Running apps without instrumentation. If the end user has code to do any operation (logs/tags/baggage) on the active span and instrumentation was not available to create the active span in the first place (e.g. when running tests) then the only safe way to do these operations is to null-check the active span.
In Kamon we don't have a lot of experience with distributed tracing, but we do have a couple stories on implementing in-process context propagation and from that I can tell that it is common to see users using baggage and that an active span is not necessarily there when it happens (usually due to disabled/unavailable/flaky instrumentation).. and you don't want your monitoring code to kill your request with a NPE.
Doing nothing means accepting that the end user will have to null-check every use of the active span and given that there is no way to enforce the check then NPEs are bound to happen. IMHO something should be done about this since this is not about making the API nicer but about making it safer to use.
With regards to how to solve this issue, I would go with option 3 from the initial comment. Looking forward to your comments!
The difference between options 2 and 3 is the ability to distinguish the NoSpan from a real span. There were a couple comments saying that instrumentation may need this signal, but frankly I am not convinced that it does. For example,
In the most situations if there is no active span new span will be created.
That does not require any specific check against the NoSpan - just use with a childOf(), and the tracer can figure out that it's a new trace because the parent is empty.
I want to make sure that given thing is recorded, create span if it doesn't exist yet
Same as above.
I want to record something, but if only there is an active span
Perhaps, only as a performance optimization, similar to logger.IsDebugEnabled()
.
But all three cases can be just as well solved by tracer.hasActiveSpan()
, instead of tracer.activeSpan() == NoSpan.INSTANCE
.
So my preference would be option 2, where the tracer returns something not null, but not explicitly telling that it's something special, and if the app needs to know it can check tracer.hasActiveSpan()
. The reason is because there is another impact of this change - at some point later the application might call tracer.inject(tracer.activeSpan(), ...)
- if we allow the tracer to control what to return when there's no active span, it can use any marker for that object to be able to tell that it is a fake span in other places, like inject
or childOf
. And it leaves the span API unchanged.
Option 3 is also ok, but it's more intrusive imho.
If we're going to make a change, my favorite option is still something like @mabn's comment:
tracer.withActiveSpan(span -> span.setTag(...));
This has the added benefit of not-executing any of the code or evaluating any of the params in the closure function if there isn't an active span, and we could document that it's permissible for withActiveSpan
to be a noop if the active span is a noop (i.e., if it's not a sampled trace).
+1 for withSpan
I like withActiveSpan, I'll take functional over imperative any time. It does have a couple of small drawbacks:
- In most cases the lambda will be referencing variables from enclosing scope, which means a closure object will almost always be allocated for each withActiveSpan() call (the same would happen with Optional type).
- The method is a helper, not an implementation detail of the tracer. In Scala or C# I would've done it as an extension method rather than a core API.
@yurishkuro re your concern #2
: I suppose we could put the implementation for the helper in the util
module (as a static method that takes a Tracer and closure as params) and "strongly encourage" Tracer impls to defer to it.
Your concern #1
boils down to a performance optimization. I agree that there's an issue there, but it's always possible to use activeSpan()
and a manual null
check. And I like that withActiveSpan()
does not add any significant new concepts to the API (as opposed to a NoSpan.INSTANCE
) and doesn't require conditionals like with hasActiveSpan()
.