zitadel icon indicating copy to clipboard operation
zitadel copied to clipboard

Zitadel OpenTelemetry traces resulting in high cardinality

Open hreddy-klaviyo opened this issue 1 year ago • 1 comments

Preflight Checklist

  • [X] I could not find a solution in the existing issues, docs, nor discussions
  • [X] I have joined the ZITADEL chat

Describe your problem

We setup the config for Zitadel to record OTel traces and OTel metrics on each endpoint https://github.com/zitadel/zitadel/blob/b055d1d9e67587eacce8e34649085bcd3268a055/cmd/defaults.yaml#L11

We've noticed that the cardinality of two specific operations is extremely high

{ZITADEL_DOMAIN}/v2beta/idp_intents/<idp_intent_id>
{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/<auth_request_id>

This is because these IDs are generated on each login and are causing a lot of issues on the ingesting side. This is further exacerbated by metrics that are computed on these.

e.g.

latency_bucket{operation="{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/V2_270990429988614816",service_name="ZITADEL",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="5"} 0
latency_bucket{operation="{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/V2_270990429988614816",service_name="ZITADEL",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 10
...

set

Describe your ideal solution

These operations should record the Auth Request IDs and IDP Intent IDs as Tags on the span rather than be present in the path.

Version

2.53.4

Environment

Self-hosted

Additional Context

image

Reducing sampling to 0.1 helped a bit to buy us some time :)

hreddy-klaviyo avatar Jun 10 '24 15:06 hreddy-klaviyo

@adlerhurst when checking this, we should also remove the host from the name

livio-a avatar Jun 12 '24 06:06 livio-a

Currently open questions:

How to map path to route? e.g. that PUT /users/1235 gets PUT /users/{user_id}

How to add context information to span? baggage or label?

adlerhurst avatar Jul 15 '24 08:07 adlerhurst

How to map path to route? e.g. that PUT /users/1235 gets PUT /users/{user_id}

https://grpc-ecosystem.github.io/grpc-gateway/docs/operations/annotated_context/ might help

livio-a avatar Jul 15 '24 08:07 livio-a