authentik icon indicating copy to clipboard operation
authentik copied to clipboard

feat(telemetry): add OpenTelemetry OTLP support with comprehensive observability

Open mkm29 opened this issue 5 months ago • 8 comments

Details

Adds comprehensive OpenTelemetry (OTLP) support to authentik for distributed tracing, metrics, and structured logging. This implementation provides enterprise-grade observability across both Python and Go components while maintaining backward compatibility with existing Prometheus and Sentry integrations.

Key features: - OTLP gRPC/HTTP export support with configurable endpoints - Unified configuration across Python Django and Go microservices - Django middleware for automatic request tracing - Go HTTP middleware with otelhttp instrumentation - Adaptive sampling (excludes health checks, higher error rates) - Auto-instrumentation for Django, PostgreSQL, Redis, Celery, and Go HTTP - Custom metrics for flows, policies, authentication events, and LDAP operations - Graceful degradation when OpenTelemetry packages unavailable - Latest OpenTelemetry Go v1.37.0 and Python packages

Configuration:

telemetry:
  otlp:
    enabled: false
    endpoint: "localhost:4317"
    protocol: "grpc"
    traces_sample_rate: 0.1
    service_name: "authentik"

Files added:

Python telemetry:

  • authentik/lib/telemetry/ - Core telemetry implementation
  • authentik/lib/tests/test_telemetry*.py - Comprehensive test suite (28 tests)
  • Extended ConfigLoader with get_float() method

Go telemetry:

  • internal/telemetry/provider.go - OpenTelemetry provider with OTLP configuration
  • internal/telemetry/middleware.go - HTTP, LDAP, and Auth middleware
  • internal/telemetry/telemetry.go - Global interface and convenience functions
  • internal/telemetry/telemetry_test.go - Comprehensive Go test suite
  • internal/telemetry/example_integration.go - Integration examples
  • internal/config/struct.go - Added TelemetryConfig and OTLPConfig
  • Updated go.mod with latest OpenTelemetry Go v1.37.0 dependencies
  • Updated authentik/lib/default.yml with telemetry configuration

Documentation:

  • OTLP_DOCUMENTATION.md - Consolidated implementation guide with Mermaid architecture diagram

closes #12854


Checklist

  • [x] Local tests pass (ak test authentik/)
  • [x] The code has been formatted (make lint-fix)

If an API change has been made

  • [x] The API schema has been updated (make gen-build)

If changes to the frontend have been made

  • [x] The code has been formatted (make web)

If applicable

  • [x] The documentation has been updated
  • [x] The documentation has been formatted (make docs)

mkm29 avatar Jul 27 '25 20:07 mkm29

Deploy Preview for authentik-storybook ready!

Name Link
Latest commit 370dfaeee9c1b7824c383dfe7673413647880a67
Latest deploy log https://app.netlify.com/projects/authentik-storybook/deploys/689b527687e26b00081be1fa
Deploy Preview https://deploy-preview-15804--authentik-storybook.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Jul 27 '25 20:07 netlify[bot]

Deploy Preview for authentik-docs ready!

Name Link
Latest commit 370dfaeee9c1b7824c383dfe7673413647880a67
Latest deploy log https://app.netlify.com/projects/authentik-docs/deploys/689b5276f28be9000817ca65
Deploy Preview https://deploy-preview-15804--authentik-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Jul 27 '25 20:07 netlify[bot]

Deploy Preview for authentik-integrations ready!

Name Link
Latest commit 370dfaeee9c1b7824c383dfe7673413647880a67
Latest deploy log https://app.netlify.com/projects/authentik-integrations/deploys/689b5276208a200008f08a9b
Deploy Preview https://deploy-preview-15804--authentik-integrations.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Jul 27 '25 20:07 netlify[bot]

Any updates on this PR?

mkm29 avatar Aug 02 '25 21:08 mkm29

we're in the middle of a release, this is on the backlog for now. I'll take a look once we have released.

rissson avatar Aug 04 '25 12:08 rissson

Ping on this. We are evaluating Authentik and the topic of observability is a question mark: we are on NewRelic so we can't use the Sentry integration that is already available. I think just supporting an open standard in OpenTelemetry makes more sense than trying to natively integrate every observability platform individually.

So thumbs up on this feature from me 👍

severin avatar Nov 27 '25 14:11 severin

I am open (heh) to having a more open monitoring standard instead of just sentry, however I am not too familiar with the OTEL ecosystem, and it feels like this adds a lot of code and is very verbose, but that might just be how OTEL works.

We may consider this in the future

BeryJu avatar Dec 04 '25 17:12 BeryJu

I am open (heh) to having a more open monitoring standard instead of just sentry, however I am not too familiar with the OTEL ecosystem, and it feels like this adds a lot of code and is very verbose, but that might just be how OTEL works.

We may consider this in the future

I think I might be in the intersection of Authentik developers and OTel developers. I don't think there should be an authentik.lib.telemetry package. The OpenTelemetry API itself is already an abstraction layer across different telemetry vendors; there's no advantage to adding yet another abstraction layer on top of it.

In general, library code should use opentelemetry.metrics.get_meter / opentelemetry.trace.get_tracer to construct a Meter or Tracer object on which to report telemetry (and if it makes sense, you could expose the meter_provider/tracer_provider arguments to allow the library's user to attach to a custom MeterProvider/TracerProvider). If OTel is not configured in the process, those functions will return a noop provider object that has very little overhead.

So for library/package code, we should be sprinkling in creation of metrics and trace spans as appropriate. The code in this PR imports directly from opentelemetry.sdk.*, but instrumented code should not be touching opentelemetry.sdk.*. MeterProvider and TracerProvider are available directly as opentelemetry.metrics.MeterProvider and opentelemetry.trace.TracerProvider. (OpenTelemetry calls this the "API" vs the "SDK". The API is lightweight and contains the surface that instrumented code should use to report telemetry and a noop implementation. It's fine and expected for library code to have a required dependency on the OpenTelemetry API. Whereas the OpenTelemetry SDK is a heavyweight implementation of the API; users don't have to use that as their API implementation, and libraries should not depend on the SDK.

As far as the middleware goes, there's already rich support for OpenTelemetry in Django with the opentelemetry-instrumentation-django package. Authentik shouldn't have its own instrumentation for packages that already have supported upstream instrumentation.

So all of that covers how the code should be instrumented, which is separate from any configuration of where that telemetry should be directed. There are two ways that OpenTelemetry wants you to do that. The recommended way is to use something called "zero-code instrumentation" - basically OTel provides a launcher (and a k8s operator) that will inject user-controlled telemetry configuration, without the application ever needing to directly interact with the SDK at all. Unfortunately, the Go support is much less complete. They're working on standardizing on a single config file format across all languages, but that's currently experimental. The alternative is to manually construct and install an SDK object with the desired configuration, and that should only happen in a main() function or equivalent. Unfortunately, that limits the user to only using a single API implementation, and only configuring it in the ways that you chose to expose (for example, you might offer a way to configure a Jaeger exporter, but if a user wants to use Zipkin they're SOL).

quentinmit avatar Dec 04 '25 19:12 quentinmit