jaeger
jaeger copied to clipboard
[Feature]: Visualize uninstrumented services in the dependency diagrams
Requirement
Visualize services in the dependency diagram even when they are not instrumented, but known from the caller side.
Problem
When the trace leaf nodes that represent outbound calls to uninstrumented services, those services are not shown in the dependency diagram (e.g. see how Zipkin shows them in #3803).
Proposal
Jaeger can infer that there is an existing callee service when the caller service logs a span with tag span.kind=client
without the corresponding span.kind=server
span.
There are several places in the code base where this will need to be accounted for:
- in the in-memory storage used by all-in-one (the easiest to start with)
- in the Flink/Spark jobs for production usage
Aside from changing the graph logic, another alternative is to have a trace enrichment which will add artificial server
spans to the trace. Then the graph building logic would not need to change at all, and the inferred nodes could also be shown in the single-trace views.
Open questions
Deciding what to call the missing callee services can be tricky. We will need to implement a heuristic that derives the name from some of the tags of the client span:
- based on OpenTracing semantic conventions
-
peer.service
-
peer.address
-
peer.ip?
+peer.port
-
- based on similar OTEL semantic conventions
There was a discussion in OTEL once about labeling the type of downstream service (e.g. an SQL db, etc), which could also be taken into account when naming the derived services.
Hi @yurishkuro,
I like your idea to just add it to the trace enrichment. Can you help me where to find the code of this? So I can try my luck to implement it? :) From a first look into the code I didn't find the right place.
@paule96 the dependencies calculation done by all-in-one happens inside memory
storage:
https://github.com/jaegertracing/jaeger/blob/711488f94f25d4a5e7900d0ba450ecb5bfdfda7b/plugin/storage/memory/memory.go#L63
It may not be very straightforward to combine this with the enricher idea because dependencies logic does not query for traces, it just accesses them directly.
@yurishkuro I am planning to look into this issue. Will share my findings here and also create a WIP PR. Also, do let me know if I should be aware of anything to make changes for this issue
Hi @yurishkuro, I noticed that this issue is still open. Is it currently being worked on? If not, I would like to take this up and contribute a fix. Please let me know if there are any requirements that I should be aware of to make changes for this issue.
Hi @yurishkuro, I’ve been looking into this issue and have come up with a preliminary approach to address the visualization of uninstrumented services in the dependency diagram. I would appreciate your feedback and insights to ensure alignment with Jaeger’s design principles and performance expectations.
1. In-Memory Storage Modification:
- Enhance the
GetDependencies
function to identify spans withspan.kind=client
that lack correspondingspan.kind=server
spans, indicating uninstrumented callee services. - Introduce a mechanism to create and add artificial server spans to represent these uninstrumented services in the in-memory store.
- Ensure these artificial spans are considered in dependency calculations and visualizations.
2. Flink/Spark Jobs Modification (for Production Usage):
- Implement a parallel logic within Flink/Spark jobs that mirrors the in-memory storage modification, ensuring uninstrumented services are visualized in production environments as well.
Alternative - Trace Enrichment:
- Develop a separate process or module that can dynamically add artificial server spans to existing traces when uninstrumented callee services are detected. This would be an alternative to modifying the core dependency graph logic or the in-memory/Flink/Spark storage.
- Ensure that these artificially added spans are seamlessly integrated into the existing visualization tools, so they appear naturally in the dependency diagrams and single-trace views.
I am eager to kick start the implementation upon your feedback and any additional insights or considerations that should be taken into account to align with Jaeger’s existing architecture.
@nidhey27 I don't completely follow your write-up. E.g. in step 1, are you presenting multiple options or think all 3 steps need to be done?
I wouldn't go with a trace enrichment approach. The basic logic in both implementations is to construct a tree of spans, and walk it while outputting parent-child links. In that algorithm, it's pretty easy to add an extra conditional branch to handle leaf CLIENT spans.
The other issue you will run into is figuring out the name of the destination service from the client span. Sometimes it may have a peer.service tag, but it may not. You will likely need to build a bit of heuristic to infer the destination from a combination of tags on the client span. This will probably require understanding a number of semantic conventions, e.g. to detect that it's a database call (and specifically which database). Unfortunately, this logic will be way more complicated than the piece I mentioned above, and even more unfortunately that we have independent Go and Java implementations for it.
@yurishkuro I had initially set my sights on inserting artificial server spans into the in-memory store to represent those uninstrumented services, and then bringing these artificial spans into play during dependency calculations and visualizations.
However, having absorbed your insights, I’ve pivoted my approach for something a bit more streamlined and, dare I say, elegant.
- Modify the existing algorithm to precisely pinpoint leaf CLIENT spans lacking corresponding SERVER spans.
- Inject a conditional branch to process these identified CLIENT spans seamlessly, ensuring the visualization of uninstrumented services is accurate and efficient.
As for the heuristic approach to nail down the destination service name, that’s still a work in progress - I’m exploring and weighing my options there.
Could you please confirm if this approach is in sync with your expectations, or is there room for some tweaks?
approach makes sense to me.