cf-abacus icon indicating copy to clipboard operation
cf-abacus copied to clipboard

Distributed Tracing

Open rajkiranrbala opened this issue 8 years ago • 3 comments

We have six micro services(4 in pipieline and 2 plugins) involved for a usage record to be processed. In order to trace a request's lifetime in the pipeline we need to enable distributed tracing in our micro services.

rajkiranrbala avatar Jun 03 '16 18:06 rajkiranrbala

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/120862389

The labels on this github issue will be updated when the story is started.

cf-gitbot avatar Jun 03 '16 18:06 cf-gitbot

Hi @rajkiranrbala ,

Tracing is definitely something we need to look into and implement at some point in time, especially since Abacus is continuously growing in complexity and number of microservices.

Still, there are some problems and things to be considered.

Options

I took a look at what's available and OpenSource out there and it seems that Zipkin and Jaeger are the prominent choices.

Doing some experimentation with both, it seems that Jaeger is better. The Zipkin UI seems outdated and less user-friendly, it regularly forgets filter preferences, and has bugs. The development aspect of Jaeger is also better, where it, being based on opentracing standard, allows for much cleaner code.

Instrumentation

I did a quick PoC to see how it would look like. You can find the code here: https://github.com/SAP/cf-abacus/tree/tracing

Note that not everything is wired in that PoC, but the main usage flow (if you run npm run demo) produces a trace through collector, meter, accumulator, aggregator with some custom spans inside.

I found the following to be a bit problematic:

  • We need to pass a context object all around Abacus in order to support full tracing. I did try continuation-local-storage and similar solutions out there but they fail when it comes to generator-based async flows, and Abacus makes use of those. In the future we may also move to async/await which is also not supported.
  • Abacus makes heavy use of batch endpoints, which combine multiple REST calls into a single one. This does not play well with tracing, since the actual endpoint is hidden in the body of the request. Furthermore, I could not get fan-in to work correctly in Jaeger.
  • Abacus makes heavy use of batch wrappers to function calls. This again produces fan-in flows.

Deployment

I have not had a chance to look into this topic.

Nevertheless, Jaeger requires that there be an Agent running side-by-side to each of the Abacus microservices. We need to see how most easily to achieve that in the scope of Cloud Foundry.


If you have any remarks or experience on the topic, your feedback would be appreciated.

ghost avatar Nov 06 '17 11:11 ghost

Another option here might be to use commercial solutions like Dynatrace / AppDynamics that can instrument the code during staging. This will give us the minimum tracing and we can later build custom spans on top.

We might want to add an abstraction layer to enable switch of tracing solutions.

hsiliev avatar Feb 14 '18 09:02 hsiliev