finagle
finagle copied to clipboard
Let's support non-RPC spans
The current tracing system is special-cased for client and server spans. There are non-rpc spans, such as expensive local code or messaging activity that are not currently representable in finagle. Let's make them possible!
Expected behavior
I'd like to be able to indicate the start and finish of an abstract operation, without relying on annotation values. For example:
Trace.recordStart() -> Tracer.record(Annotation.Start())
Trace.recordOperation(message: operation) -> Tracer.record(Annotation.Operation(operation))
Trace.recordFinish() -> Tracer.record(Annotation.Finish())
Trace.recordError(message: String) -> Tracer.record(Annotation.Error(message))
These annotations control the authoritative start and finish (success or error) of an abstract span. Annotation.Operation is the same use as Annotation.Rpc, except it more clearly indicates it isn't necessarily an RPC (ex a one-way message).
Actual behavior
Right now, you can only authoritatively create and finish RPC spans. If you create a different type of span, you end up with a "finagle.flush" annotation 2 minutes later as the tracer has no idea that the span is complete. It is also impossible to create a span that only includes binary annotations.
cc'ing some zipkin-finagle peeps: @mosesn @eirslett @sveinnfannar @kristofa
just had a chat with @capitanbatata about this. any thoughts about this proposal (or other ways to make intermediate spans?) it seems a nice and incremental win for those who want to model things like a dao layer.
seems like a useful feature. what happens if more than one Start nests? would it help to take an identifier?
seems like a useful feature. what happens if more than one Start nests? would it help to take an identifier?
I suspect it would be similar to if multiple ClientSend() were called. ideally, duplicates would be ignored.
Right now, the status quo for local spans is pretending they are clients, so this is a pretty good way to think through things.
cc @capitanbatata
I noticed a finagle tracer port by criteo has LocalOperationStart/Stop
https://github.com/criteo/zipkin4net/tree/master/zipkin4net/Criteo.Profiling.Tracing/Annotation
whatever we do here, we should probably do consistently with other finagle tracer ports like zipkin-js and criteo's tracer cc @wirehead @eirslett @fedj
In zipkin4net, we ended up doing LocalOperationStart and LocalOperationStop for two reasons:
- It allows to easily nest one operation into another without having to care about the ids in the client code
- It follows the RPC span pattern having a start and a stop operation
The cons were that it's not following directly serialization annotation pattern. We thought that the serialization can easily change whereas APIs are hard to change.
howdy we are about to merge the second identical implementation of this (after zipkin4net which is also a finagle tracer clone) https://github.com/openzipkin/zipkin-js/pull/156
Local tracing
Sometimes you have activity that precedes a remote request that you want to
capture in a trace. tracer.local
can time an operation, placing a
corresponding span ID in scope so that any downstream commands end up in the
same trace.
const result = tracer.local('checkout', () => {
return someComputation();
});
This is supported underneath by two signaling annotations:
function LocalOperationStart(name) {
this.name = name;
}
LocalOperationStart.prototype.toString = function() {
return `LocalOperationStart("${this.name}")`;
};
class LocalOperationStop extends SimpleAnnotation {}
The first indicates the span name for use. In the recorder, we look at these and simply set span.timestamp/duration accordingly.
Sound like a decent fit for upstream? (here)
Sure, that seems reasonable to me.
Hey guys! Any update on this?