cpp_client_telemetry icon indicating copy to clipboard operation
cpp_client_telemetry copied to clipboard

Add Distributed Tracing semantics

Open maxgolov opened this issue 3 years ago • 3 comments

Add Distributed Tracing semantics to the following APIs:

  • ISemanticContext - allows to set (optional) Distributed Tracing context on all events emitted by ILogger
  • ability to use env_dt aliases to customize TraceContext of individual events running in a given scope. That allows manual nesting of a child span to a parent span.
  • semantic context projection for C++/CL/CX apps on Windows
  • add test that emits two events with DT semantics

PR adds "common" fields:

  • env_dt_traceId
  • env_dt_spanId
  • env_dt_traceFlags
  • Part B parentId

And the set of corresponding high-level API setters at context-level:

  • SetTraceId
  • SetSpanId
  • SetTraceFlags
  • SetParentId

which are styled in the same way as other common context setters are presently styled, in uniform fashion.

How to use it:

  • events land in Aria-Kusto (Aria schema) or Custom Kusto (common schema), and could be subsequently routed to Geneva via 1DS-Interchange (Geneva schema).
  • events landing in Geneva have the required ext.dt extension semantics, just enough for the Trace Explorer view to work.. Less duration semantics: since 1DS in principle does not require client events to have "duration", start / end / scope. Events are timepoints that carry timestamp and name. How exactly to remap these, i.e. what would/could/should be considered duration - is out-of-scope of this PR. A separate routing example would be provided elsewhere, out-of-scope for Client Telemetry repo.

How this could be improved further:

  • There's no helper for formatting individual W3C TraceContext fields here. Users may use whatever standard formatter of their choice, e.g. taking a dependency on OpenTelemetry API for well-formed context of trace parent header values. In our org scenario we don't need it - since we are utilizing existing .NET APIs that already give us the access to values in the proper format. So we're only calling into 1DS C++ SDK via projection to set and send the well-formed values we need. We do not require manual formatting of these values in C++ land at all.
  • Should someone need to learn how to do this formatting in C++ - please refer to the OpenTelemetry C++ code here, that illustrates how to format TraceId and SpanId to hexadecimal string values.

Example showing how events look when traced from client to service in one Kusto table: image

APITest verifies the basic functionality. More complex usage examples are not presently a priority for our team, since in our case we use the higher-level .NET StartActivity API while interacting with remote service instrumented with OpenTelemetry, and supplement the existing client telemetry events with correlation identifiers that follow Common Schema naming, specifically Common Schema -to- Geneva mapping.

Separate additional work could be done later - to add Span and Log semantics, allowing to use 1DS SDK as a lower-level transport channel / exporter for OpenTelemetry SDK. Presently this additional work is out-of-scope.

maxgolov avatar Oct 31 '22 23:10 maxgolov

Most of the attributes/fields defined in the ISemanticContext interface are not going to change after bootstrapping of SDK. Does Distributed Tracing attributes fit in this category - as trace_id/ span_id could change based on the context where Logger::LogEvent() is called.

Should we instead have an overloaded Logger::LogEvent method which accepts trace context. Something similar to how it is done in OpenTelemetry logging api - https://github.com/open-telemetry/opentelemetry-cpp/blob/389b84f4303f728c97dcf8194f76277488e18cf4/api/include/opentelemetry/logs/logger.h#LL59C3-L59C3

lalitb avatar Nov 14 '22 07:11 lalitb

@maxgolov - Do you plan to address the concerns raised in this PR, and make it ready for merge. Thanks.

lalitb avatar Jan 06 '23 18:01 lalitb

@lalitb I'd like to address some comments from reviewers, i.e.

  • adding a method that allows to set both TraceId+SpanId at once
  • possibly exposing a method like this to higher-level apps (I need it to work via C API too)

Apologies for keeping it open for so long. It's coming soon.

maxgolov avatar Feb 14 '23 04:02 maxgolov