jina icon indicating copy to clipboard operation
jina copied to clipboard

Add OpenTelemetry to increase observability

Open girishc13 opened this issue 1 year ago • 1 comments

Describe the feature

Improve observability using OpenTelemetry and the available sdk implementations to

  • enable tracing of requests
  • standardize the already available prometheus metrics collection
  • API is need to enable measurement and SDK is required to collect and export metrics

Standard API's are available for monitoring different operations with varying granularity which reduces the effort of using different metrics collection and aggregation vendors. The feature is optional and can be independently configured by users depending on the telemetry collector implementation.

Your proposal

Add OpenTelemetry API's and standard SDK's for:

  • tracing network requests within the flow ecosystem
  • convert the existing prometheus collection and export to the new open telemetry compatible metrics collector
  • allow users to better tracks the operation in the executors

Available packages:

  • https://pypi.org/project/opentelemetry-api/
  • https://pypi.org/project/opentelemetry-sdk/
  • https://pypi.org/project/opentelemetry-instrumentation-grpc/

Environment

Screenshots

girishc13 avatar Sep 09 '22 08:09 girishc13

Below some points of consideration for introducing the first version of OpenTelemetry support.

Configuration Options

~1. Use JINA_ENABLE_OTEL_TRACING to enable tracing everywhere.~ ~2. Use JINA_ENABLE_OTEL_METRICS to enable metrics everywhere.~ ~3. The use could overwrite the environment variables at the Gateway or Executor level.~

  1. Provide pod level parser options:
    • '--opentelemetry-tracing' to enable tracing.
    • '--opentelemetry-metrics' to enable metrics.
  2. Add the above two options to the client parser to enable OpenTelemetry tracing and metrics at the client level.

Package

  • Name the package instrumentation to provide a clear separation between existing telemetry and prometheus client. ~- This package will provide the global TRACER and METER classes that will:~
    • ~provide helper methods to create a span from a parent span if exists otherwise create a stand alone span.~
    • ~provide helper methods to create instruments from the metrics provider.~
    • The package will contain the InstumentationMixin which instantiates the tracer and the metrics providers based of the self.args argument. This MixIn can be added to any method or operation that wants to create a trace or measure an operation.
    • The InstumentationMixin has been added to the BaseClient, AsyncNewLoopRuntime which is used as a base for Client, Gateway and Runtime abstractions.
    • Further, the InstumentationMixin will provide static objects and methods for grpc.aio interceptors for tracing grpc server and channels. The grpc.aio interceptor are provided in the instrumentation package because the official opentelemtry-python contrib doesn't yet support implementations for grpc.aio.Server abstractions. These can be removed once the contrib package adds the required support.

Default Attributes

  1. Add OTEL defined semantic attributes by default.
    1. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/
    2. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/rpc/
    3. https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/
  2. For later:
    1. we can configure some default jcloud cluster deployment identifiers.
    2. we can allow users to add additional global attributes by parsing environment variables with a prefix like OTEL_CUSTOM_ATTRIBUTE_APPLICATION_ID=lottiefiles which will add a global tracing attribute as APPLICATION_ID=lottiefiles.
    3. docker image name and tag?
  3. Shoud I include the current telemetry info on the tracing and metrics provider. This information will be added automatically to all spans created from the TRACER and METER objects?

Default Tracing in a Flow

  1. Trace requests in the request handler by ensuring that the parent span is properly propagated to the executor. Communications with the Gateway → Executor or Executor → Executor must be covered by default.
  2. Ensure that http,grpc,websocket requests (based on the Gateway) attributes from the Client → Gateway are correctly propagated.
  3. Provide the parent span from the request handler to the request method. The user must add code to cover any additional operations within the requests method using the provided helper methods to ensure correct propagation.

Exporter Configuration

  1. Default OTEL trace exporter configurations can be provided as per OTEL recommendations.
  2. Default Prometheus metrics exporter can be provided as per OTEL recommendations.
  3. Use standard yaml parsers provided in the sdk.

Documentation

  • New page?

girishc13 avatar Sep 19 '22 08:09 girishc13