finagle icon indicating copy to clipboard operation
finagle copied to clipboard

Let's support non-RPC spans

Open codefromthecrypt opened this issue 7 years ago • 10 comments

The current tracing system is special-cased for client and server spans. There are non-rpc spans, such as expensive local code or messaging activity that are not currently representable in finagle. Let's make them possible!

Expected behavior

I'd like to be able to indicate the start and finish of an abstract operation, without relying on annotation values. For example:

Trace.recordStart() -> Tracer.record(Annotation.Start())
Trace.recordOperation(message: operation) -> Tracer.record(Annotation.Operation(operation))
Trace.recordFinish() -> Tracer.record(Annotation.Finish())
Trace.recordError(message: String) -> Tracer.record(Annotation.Error(message))

These annotations control the authoritative start and finish (success or error) of an abstract span. Annotation.Operation is the same use as Annotation.Rpc, except it more clearly indicates it isn't necessarily an RPC (ex a one-way message).

Actual behavior

Right now, you can only authoritatively create and finish RPC spans. If you create a different type of span, you end up with a "finagle.flush" annotation 2 minutes later as the tracer has no idea that the span is complete. It is also impossible to create a span that only includes binary annotations.

codefromthecrypt avatar Aug 09 '16 13:08 codefromthecrypt

cc'ing some zipkin-finagle peeps: @mosesn @eirslett @sveinnfannar @kristofa

codefromthecrypt avatar Aug 09 '16 13:08 codefromthecrypt

just had a chat with @capitanbatata about this. any thoughts about this proposal (or other ways to make intermediate spans?) it seems a nice and incremental win for those who want to model things like a dao layer.

codefromthecrypt avatar Feb 02 '17 13:02 codefromthecrypt

seems like a useful feature. what happens if more than one Start nests? would it help to take an identifier?

kevinoliver avatar Feb 07 '17 01:02 kevinoliver

seems like a useful feature. what happens if more than one Start nests? would it help to take an identifier?

I suspect it would be similar to if multiple ClientSend() were called. ideally, duplicates would be ignored.

Right now, the status quo for local spans is pretending they are clients, so this is a pretty good way to think through things.

cc @capitanbatata

codefromthecrypt avatar Feb 09 '17 02:02 codefromthecrypt

I noticed a finagle tracer port by criteo has LocalOperationStart/Stop

https://github.com/criteo/zipkin4net/tree/master/zipkin4net/Criteo.Profiling.Tracing/Annotation

codefromthecrypt avatar May 02 '17 01:05 codefromthecrypt

whatever we do here, we should probably do consistently with other finagle tracer ports like zipkin-js and criteo's tracer cc @wirehead @eirslett @fedj

codefromthecrypt avatar May 02 '17 01:05 codefromthecrypt

In zipkin4net, we ended up doing LocalOperationStart and LocalOperationStop for two reasons:

  • It allows to easily nest one operation into another without having to care about the ids in the client code
  • It follows the RPC span pattern having a start and a stop operation

The cons were that it's not following directly serialization annotation pattern. We thought that the serialization can easily change whereas APIs are hard to change.

fedj avatar May 02 '17 06:05 fedj

howdy we are about to merge the second identical implementation of this (after zipkin4net which is also a finagle tracer clone) https://github.com/openzipkin/zipkin-js/pull/156

Local tracing

Sometimes you have activity that precedes a remote request that you want to capture in a trace. tracer.local can time an operation, placing a corresponding span ID in scope so that any downstream commands end up in the same trace.

const result = tracer.local('checkout', () => {
  return someComputation();
});

This is supported underneath by two signaling annotations:

function LocalOperationStart(name) {
   this.name = name;
 }
 LocalOperationStart.prototype.toString = function() {
   return `LocalOperationStart("${this.name}")`;
 };
 
 class LocalOperationStop extends SimpleAnnotation {}

The first indicates the span name for use. In the recorder, we look at these and simply set span.timestamp/duration accordingly.

Sound like a decent fit for upstream? (here)

codefromthecrypt avatar Nov 14 '17 06:11 codefromthecrypt

Sure, that seems reasonable to me.

mosesn avatar Nov 14 '17 16:11 mosesn

Hey guys! Any update on this?

LiuVII avatar Jan 16 '19 17:01 LiuVII