jina
jina copied to clipboard
Add OpenTelemetry to increase observability
Describe the feature
Improve observability using OpenTelemetry and the available sdk implementations to
- enable tracing of requests
- standardize the already available prometheus metrics collection
- API is need to enable measurement and SDK is required to collect and export metrics
Standard API's are available for monitoring different operations with varying granularity which reduces the effort of using different metrics collection and aggregation vendors. The feature is optional and can be independently configured by users depending on the telemetry collector implementation.
Your proposal
Add OpenTelemetry API's and standard SDK's for:
- tracing network requests within the flow ecosystem
- convert the existing prometheus collection and export to the new open telemetry compatible metrics collector
- allow users to better tracks the operation in the executors
Available packages:
- https://pypi.org/project/opentelemetry-api/
- https://pypi.org/project/opentelemetry-sdk/
- https://pypi.org/project/opentelemetry-instrumentation-grpc/
Environment
Screenshots
Below some points of consideration for introducing the first version of OpenTelemetry support.
Configuration Options
~1. Use JINA_ENABLE_OTEL_TRACING
to enable tracing everywhere.~
~2. Use JINA_ENABLE_OTEL_METRICS
to enable metrics everywhere.~
~3. The use could overwrite the environment variables at the Gateway or Executor level.~
- Provide pod level parser options:
- '--opentelemetry-tracing' to enable tracing.
- '--opentelemetry-metrics' to enable metrics.
- Add the above two options to the client parser to enable OpenTelemetry tracing and metrics at the client level.
Package
- Name the package
instrumentation
to provide a clear separation between existing telemetry and prometheus client. ~- This package will provide the global TRACER and METER classes that will:~- ~provide helper methods to create a span from a parent span if exists otherwise create a stand alone span.~
- ~provide helper methods to create instruments from the metrics provider.~
- The package will contain the
InstumentationMixin
which instantiates the tracer and the metrics providers based of theself.args
argument. This MixIn can be added to any method or operation that wants to create a trace or measure an operation. - The InstumentationMixin has been added to the
BaseClient
,AsyncNewLoopRuntime
which is used as a base for Client, Gateway and Runtime abstractions. - Further, the InstumentationMixin will provide static objects and methods for
grpc.aio
interceptors for tracing grpc server and channels. Thegrpc.aio
interceptor are provided in the instrumentation package because the official opentelemtry-python contrib doesn't yet support implementations forgrpc.aio.Server
abstractions. These can be removed once the contrib package adds the required support.
Default Attributes
- Add OTEL defined semantic attributes by default.
- For later:
- we can configure some default jcloud cluster deployment identifiers.
- we can allow users to add additional global attributes by parsing environment variables with a prefix like
OTEL_CUSTOM_ATTRIBUTE_APPLICATION_ID=lottiefiles
which will add a global tracing attribute asAPPLICATION_ID=lottiefiles
. - docker image name and tag?
- Shoud I include the current telemetry info on the tracing and metrics provider. This information will be added automatically to all spans created from the TRACER and METER objects?
Default Tracing in a Flow
- Trace requests in the request handler by ensuring that the parent span is properly propagated to the executor. Communications with the Gateway → Executor or Executor → Executor must be covered by default.
- Ensure that http,grpc,websocket requests (based on the Gateway) attributes from the Client → Gateway are correctly propagated.
- Provide the parent span from the request handler to the request method. The user must add code to cover any additional operations within the requests method using the provided helper methods to ensure correct propagation.
Exporter Configuration
- Default OTEL trace exporter configurations can be provided as per OTEL recommendations.
- Default Prometheus metrics exporter can be provided as per OTEL recommendations.
- Use standard yaml parsers provided in the sdk.
Documentation
- New page?