minder icon indicating copy to clipboard operation
minder copied to clipboard

Add request id to logs.

Open blkt opened this issue 1 year ago • 4 comments

Summary

This change adds a GRPC interceptor that fetches the header Grpc-Metadata-Request-Id and adds it to the logger. It also returns the request id to the client in the trailer, which is helpful for debugging.

CLI command minder history list was modified to show its usage, but those changes should be reverted.

Change Type

  • [ ] Bug fix (resolves an issue without affecting existing features)
  • [X] Feature (adds new functionality without breaking changes)
  • [ ] Breaking change (may impact existing functionalities or require documentation updates)
  • [ ] Documentation (updates or additions to documentation)
  • [ ] Refactoring or test improvements (no bug fixes or new functionality)

Testing

Manual tests.

Review Checklist:

  • [X] Reviewed my own code for quality and clarity.
  • [X] Added comments to complex or tricky code sections.
  • [ ] Updated any affected documentation.
  • [ ] Included tests that validate the fix or feature.
  • [X] Checked that related changes are merged.

blkt avatar Sep 12 '24 07:09 blkt

Coverage Status

coverage: 53.045% (-0.06%) from 53.105% when pulling 4c55b9b382ce014bf84281e61e227fbec3676514 on enh/request-id-interceptor into 15fd53d4009f24d907687022025677f89f5c83b5 on main.

coveralls avatar Sep 12 '24 07:09 coveralls

We might want to use https://www.w3.org/TR/trace-context/ or https://uptrace.dev/opentelemetry/opentelemetry-traceparent.html

evankanderson avatar Sep 17 '24 13:09 evankanderson

@evankanderson while estimating work to improve logging I had a look at the documents you linked. My take is that the capabilities described in those specs far exceed our current need, which is to track a request execution from start to finish in logs.

My take is that we're better off polishing this work, possibly changing the header's key (not the value), and merge it.

blkt avatar Oct 02 '24 12:10 blkt

@evankanderson while estimating work to improve logging I had a look at the documents you linked. My take is that the capabilities described in those specs far exceed our current need, which is to track a request execution from start to finish in logs.

My take is that we're better off polishing this work, possibly changing the header's key (not the value), and merge it.

Sounds good; I'd thought this would be something like:

import "go.opentelemetry.io/otel/trace"

// ...
sc := trace.SpanContextFromContext(ctx)
ctx = zerolog.Ctx(ctx).With().Str("trace_id", sc.TraceID()).Logger().WithContext(ctx)

If there's a bunch more content, then I'd agree on doing our own thing.

evankanderson avatar Oct 02 '24 13:10 evankanderson

Here is an example of the output

curl -v -H "Authorization: Bearer $TOKEN" 'http://localhost:8080/api/v1/user'
> GET /api/v1/user HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Authorization: Bearer xxx
>
* Request completely sent off
< HTTP/1.1 200 OK
< Content-Length: 933
< Content-Type: application/json
< Grpc-Metadata-Content-Type: application/grpc
< Grpc-Metadata-Request-Id: 062ff9f8-e94a-4742-8cc1-8c07b817c286
< Date: Tue, 15 Oct 2024 12:33:20 GMT
<
* Connection #0 to host localhost left intact
* ...

Logs would be like this

{"level":"info","Resource":{"service":"minder.v1.UserService","method":"GetUser"},"request_id":"062ff9f8-e94a-4742-8cc1-8c07b817c286","caller":"/Users/michelangelo/stacklok/minder/internal/controlplane/handlers_authz.go:60","Timestamp":1728995600029354000,"message":"Bypassing setting up context"}
{"level":"info","Resource":{"service":"minder.v1.UserService","method":"GetUser"},"request_id":"062ff9f8-e94a-4742-8cc1-8c07b817c286","caller":"/Users/michelangelo/stacklok/minder/internal/controlplane/handlers_authz.go:86","Timestamp":1728995600029372000,"message":"Bypassing project authorization"}

Using Grpc-Metadata- prefix gets us the least amount of changes to the codebase while still getting the result, but we could write our own matcher for the header.

blkt avatar Oct 15 '24 12:10 blkt

Pull Request Test Coverage Report for Build 11351156474

Details

  • 0 of 14 (0.0%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.04%) to 54.468%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/controlplane/server.go 0 3 0.0%
internal/logger/logging_interceptor.go 0 11 0.0%
<!-- Total: 0 14
Files with Coverage Reduction New Missed Lines %
internal/eea/eea.go 2 73.73%
<!-- Total: 2
Totals Coverage Status
Change from base Build 11344831838: -0.04%
Covered Lines: 14929
Relevant Lines: 27409

💛 - Coveralls

coveralls avatar Oct 15 '24 12:10 coveralls

I think this is a good start, I see the result both on the server side in the middleware and the client side in the headers (verified with curl).

Regarding the server vs client side headers - what if we had another header like e.g. Request-Correlation-ID that the client could set in case a single client operation performs multiple requests? In that case the client might set the correlation to request ID of the first operation. There would be still the unique request ID and the correlation would really be just a hint for easier search of requests in the logs?

I'd wait here for a "bigger" tracing solution, and then have the top-level command use some sort of trace-parent to correlate the requests.

One thing that may not be obvious here, but it looks like the request_id header won't pass through any usages of watermill or other asynchronous processing. I think that's fine for now, but it's worth being aware of given how much async processing we do.

evankanderson avatar Oct 15 '24 23:10 evankanderson

Regarding the server vs client side headers - what if we had another header like e.g. Request-Correlation-ID that the client could set in case a single client operation performs multiple requests?

I agree with @evankanderson in this, tracing across multiple requests requires some amount of shared state and more changes to the codebase. This is just a helper to filter down logs quickly.

One thing that may not be obvious here, but it looks like the request_id header won't pass through any usages of watermill or other asynchronous processing.

That's correct, for that case we already track Github's delivery id that sort-of solves the same problem. https://github.com/mindersec/minder/blob/main/internal/providers/github/webhook/hook.go#L127 https://github.com/mindersec/minder/blob/main/internal/providers/github/webhook/app.go#L151 We might want to spend some effort ensuring this is logged correctly downstream watermill. I'll create a task.

blkt avatar Oct 16 '24 10:10 blkt