feat: add OTEL instrumentation for authentication and handlers
Follow up to #23727, improving observability for the OIDC authentication flow:
- Add OTEL tracer initialization in server, session, oidc packages
- Add span creation to all methods for tracing authentication flow with SSO
- Wrap HTTP handlers with otelhttp.NewHandler to capture the root span
- Add error status and attributes to spans'
- Capture cache read/write timing
Checklist:
- [x] Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
- [x] The title of the PR states what changed and the related issues number (used for the release note).
- [x] The title of the PR conforms to the Title of the PR
- [x] I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
- [x] I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
- [x] Does this PR require documentation updates?
- [x] I've updated documentation as required by this PR.
- [x] I have signed off all my commits as required by DCO
- [x] I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
- [x] My build is green (troubleshooting builds).
- [x] My new feature complies with the feature status guidelines.
- [x] I have added a brief description of why this PR is necessary and/or what this PR solves.
- [ ] Optional. My organization is added to USERS.md.
- [ ] Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).
:x: Preview Environment deleted from Bunnyshell
Available commands (reply to this comment):
- :rocket:
/bns:deployto deploy the environment
Codecov Report
:x: Patch coverage is 73.55372% with 32 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 62.52%. Comparing base (3bf3d8a) to head (faa209d).
:warning: Report is 7 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| util/oidc/oidc.go | 71.95% | 22 Missing and 1 partial :warning: |
| util/oidc/provider.go | 64.28% | 5 Missing :warning: |
| server/server.go | 76.47% | 4 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #25296 +/- ##
==========================================
+ Coverage 62.49% 62.52% +0.02%
==========================================
Files 351 351
Lines 49602 49697 +95
==========================================
+ Hits 31001 31072 +71
- Misses 15631 15655 +24
Partials 2970 2970
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Could you share the snapshot of metrics on a visualization tool?
Could you share the snapshot of metrics on a visualization tool?
Here are some examples of the new spans:
server.ArgoCDServer.Authenticate
This is a new span to capture an authentication flow:
session.SessionService/GetUserInfo
Captured by gRPC middleware, showing new oidc spans
cluster.SettingsService/Get
Captured by gRPC middleware, showing new oidc spans
server.ClientApp/HandleCallback
This was captured by the new OTEL middleware handler:
Could you share the snapshot of metrics on a visualization tool?
Capturing cache read/write timing. We should also consider extending the cache interface in the future to accept a context so it could add its own spans.
I thought it would be Jaeger, since I worked on OTEL instrumentation part. 😄 Thanks for sharing those.