opa icon indicating copy to clipboard operation
opa copied to clipboard

Add the req_id attribute on the decision logs

Open humbertoc-silva opened this issue 1 year ago • 18 comments

What is the underlying problem you're trying to solve?

I'm working on OPA logs (request/response logs, decision logs, status logs, etc.). I've noticed that the request/response logs have a correlation through the req_id attribute. I've seen that debug logging with the print built-in function uses this attribute too. This is useful as I can correlate all logs related to a specific request/response.

I would like to know if is it a good idea to have the req_id attribute on decision logs too. So the correlation would be complete and I could analyze all the logs generated by OPA when handling a single request/response.

Describe the ideal solution

The ideal solution is to add the attribute req_id to the decision log.

Additional Context

I am working with the console decision log but maybe the req_id could be useful when the decision log is configured to send the logs to a control plane too because I can search and correlate the other logs based on this attribute.

humbertoc-silva avatar Aug 12 '22 13:08 humbertoc-silva

Hi @humbertoc-silva ! And thanks for opening this. Maybe I'm missing something obvious here :) What does this add compared to the decision ID generated for a decision and added to the decision log entry? The request ID would still be generated by the OPA server, right?

It might also be worth to point out that the req_id attribute is currently coupled to the "request logging" facility, which is only enabled at log level INFO or below. If you'd set the log level to ERROR — which is the recommended level for production — you'll neither see this type of logs in the console, or have a req_id created. There's an open issue to allow configuring that logger independently from other logging, so perhaps we'd need to do that first if we want to have this implemented.

anderseknert avatar Aug 12 '22 20:08 anderseknert

Hi @anderseknert,

The decision log ID has its own value, generally, a client application will get it when receiving a response from OPA and the responsible team will use it to filter the logs. But it can not help when we need to correlate the other logs generated by the server (request/response/print), e.g. I don't know which print precisely occurs in a certain decision. today it is not possible to correlate these other logs with the decision log.

I agree that this correlation just makes sense if the log level is INFO or higher because if the log level is lower there is nothing to correlate. So, maybe the log level needs to be considered when adding the req_id to the decision logs.

humbertoc-silva avatar Aug 12 '22 21:08 humbertoc-silva

Thanks @humbertoc-silva! I guess the next question then would be — when decision logging is enabled, is there any real value in keeping the "request log" enabled too? AFAICS, there's not much added in terms of data not already present in the decision log itself. Firing off a request with OPA running as opa run --server --set decision_logs.console=true:

request log

{
  "client_addr": "127.0.0.1:53452",
  "level": "info",
  "msg": "Sent response.",
  "req_id": 1,
  "req_method": "GET",
  "req_path": "/v1/data/policy/response",
  "resp_bytes": 54,
  "resp_duration": 0.6725,
  "resp_status": 200,
  "time": "2022-08-12T23:22:27+02:00"
}

decision log

{
  "decision_id": "163be9bb-b8e7-4324-a99b-9263e498a116",
  "labels": {
    "id": "d71df7ba-b0ba-40ca-a3de-4e127158c7cd",
    "version": "0.44.0-dev"
  },
  "level": "info",
  "metrics": {
    "counter_server_query_cache_hit": 0,
    "timer_rego_external_resolve_ns": 250,
    "timer_rego_input_parse_ns": 1708,
    "timer_rego_query_compile_ns": 79958,
    "timer_rego_query_eval_ns": 14125,
    "timer_rego_query_parse_ns": 58542,
    "timer_server_handler_ns": 257583
  },
  "msg": "Decision Log",
  "path": "policy/response",
  "requested_by": "127.0.0.1:53452",
  "time": "2022-08-12T23:22:27+02:00",
  "timestamp": "2022-08-12T21:22:27.575339Z",
  "type": "openpolicyagent.org/decision_logs"
}

If there are some error conditions that would stop e.g. the decision logger from logging at all, the normal error logging of OPA would hopefully be clear enough on what caused that error. Could definitely be some cases I'm not thinking of though 🙂

anderseknert avatar Aug 12 '22 21:08 anderseknert

Some information is present on the decision log, some not, but when we have debugging log turned on via the print built-in function it definitely won't be present, and in some cases, it could be valuable to have some data printed to help on troubleshooting or to understand more about the decision in complex policies and all the logs connected to facilitate searching and filtering.

Maybe I am doing misuse of the print function, but this was a way I found to get more information about the decision instead return more data in the Data API response. I think that is more correct to do this using logs than putting extra information on client responses.

What do you think about this?

humbertoc-silva avatar Aug 13 '22 14:08 humbertoc-silva

Seems like a simple addition to make, so if it helps you with debugging, I don't see why not :)

anderseknert avatar Aug 14 '22 19:08 anderseknert

Do you think that the req_id attribute always must be included on the decision log or when the log level is lower or equal info?

humbertoc-silva avatar Aug 15 '22 13:08 humbertoc-silva

Hi @anderseknert,

I took some time to understand the decision log plugin and how req_id is generated, it is my first contact with the OPA source code. The request ID is an attribute of LogginHandler and the requestContext.ReqID attribute is set when the server handles a new request.

I saw that we just have req_id on request context when the log level is lower than or equal to info, as you said before.

I think that this method must be updated to include the re_id on the event as well as update the EventV1 type, and the AST method too, right?

humbertoc-silva avatar Aug 17 '22 22:08 humbertoc-silva

Seems about right @humbertoc-silva 👍 I guess we'd want to create the req_id outside of the "request logger", or else it will only be seen in the decision logs if that's enabled. OTOH, perhaps that's the only time it adds value, since it's not seen anywhere else?

anderseknert avatar Aug 18 '22 08:08 anderseknert

Let me see if I got it. To always have the req_id on decision logs the ID always needs to be propagated instead to propagate it when the log level is lower or equal to info, so some change is necessary to make this independent of the log level. Here is where a new request is set with extra information based on the info log level.

At a first glance, it makes sense to have the req_id just when the other logs exist too, so we gonna have data to correlate to the decision log, and the req_id will be the attribute that joins all together.

humbertoc-silva avatar Aug 18 '22 17:08 humbertoc-silva

Agreed, if it just starts showing up in decision logs for everyone, there's probably going to be a lot of questions asked about its purpose 😅 So maybe we can start by only having it logged when the "request logger" is enabled. We can always change that later if we want.

anderseknert avatar Aug 19 '22 08:08 anderseknert

Hi @anderseknert,

I will need help to get the request ID value on the decision log plugin. I will need to do a type assertion to extract the value from the context but the type requestContext is not exported from runtime package.

Are there other places that use this information that could be used as an example? Do you have any suggestions on how I can do this?

Thank you!

humbertoc-silva avatar Aug 19 '22 15:08 humbertoc-silva

In the meantime, I tried something here. Please, check my branch out and take a look.

I moved the request context temporarily to the logging package to do a test.

humbertoc-silva avatar Aug 19 '22 18:08 humbertoc-silva

Hi @anderseknert,

Did you have a chance to take a look at my branch? As I mentioned earlier, I moved the request context to do a test but I think it is not the best place to move this code. I would like your opinion and orientation about this.

Thank you very much!

humbertoc-silva avatar Sep 02 '22 12:09 humbertoc-silva

Hey @humbertoc-silva! And sorry for the delay. Busy week with 2 releases.. 😅 Code looks like a good start to me, but @srenatus might be better equipped to say if there are problems with the approach.

anderseknert avatar Sep 08 '22 09:09 anderseknert

👀 It looks OK-ish, but I wonder if we couldn't make the RequestID part of the decision struct (*server.Info).... putting values on the context always feels a tiny bit underhanded to me 😅 Or have you already considered that approach, and encountered blockers?

srenatus avatar Sep 08 '22 09:09 srenatus

Hi guys,

I understand @anderseknert. @srenatus on the weekend I will try to implement using this approach, making the RequestID part of the decision log struct. Can I use the contributor's channel to ask questions about the implementation?

Thank you!

humbertoc-silva avatar Sep 09 '22 11:09 humbertoc-silva

Can I use the contributor's channel to ask questions about the implementation?

Of course, that's what it's for 😃

srenatus avatar Sep 09 '22 11:09 srenatus

Hi guys,

I created a new branch and implemented the feature following the @srenatus tip (making the RequestID part of the decision struct).

Could you, @anderseknert and @srenatus, check my new branch out, please?

Thank you.

humbertoc-silva avatar Sep 21 '22 19:09 humbertoc-silva

You could also open a PR right away. It's easier to comment on that etc

srenatus avatar Sep 22 '22 05:09 srenatus