opa icon indicating copy to clipboard operation
opa copied to clipboard

Export Decision Logs to S3

Open danoliver1 opened this issue 1 year ago • 12 comments

What is the underlying problem you're trying to solve?

I would like to export the decision log history directly into an S3 bucket. This would allow decision logs to be queried using AWS Glue.

Describe the ideal solution

This is my example config which I hoped would work but it seems S3 can only be used to pull bundles currently.

services:
  s3:
    url: https://my-bucket-name.s3.eu-west-2.amazonaws.com/
    credentials:
      s3_signing:
        web_identity_credentials:
          aws_region: eu-west-2
bundles:
  authz:
    service: s3
    resource: policies/default.tar.gz
    polling:
      min_delay_seconds: 60
      max_delay_seconds: 120
plugins:
  envoy_ext_authz_grpc:
      addr: :9191
      path: example/ingress/allow
      dry-run: false
      enable-reflection: false
decision_logs:
  service: s3
  console: true

The ideal solution would be as above. E.g. set decision_logs.service to the S3 service, and uploading to S3 would be automatically handled behind the scenes.

Additional Context

In my example, the service is already configured to pull bundles from S3 so uploading the logs to the same bucket would be ideal. As a workaround I could create a service to receive the decision logs via HTTP and upload to S3 but this seems like unnecessary extra complexity.

danoliver1 avatar Jul 12 '22 12:07 danoliver1

Thanks for filing this @danoliver1!

I'm curious about what the implementation would look like. Decision log uploads normally happen quite frequently, and with many running OPA instances for a given service, you'd see a lot of these log files being uploaded. Would each upload need a unique name, or is there some functionality to append to files in S3?

Since we'd potentially have many OPA's writing to the same bucket, we can't just increment some number to each upload, but would need some naming scheme like opa-$hostname-$random-uuid.tar.gz, and it would be the job of the log collector to read through all of the uploads and try to order the events based on the contents? I haven't worked with AWS Glue, so maybe it provides all the answer to this :)

If these are all noob questions/concerns, I'd be happy to learn something new! But either way, I think we should try to include a solution proposal for how this would work in practice.

anderseknert avatar Jul 12 '22 19:07 anderseknert

The quantity of files wouldn't be an issue as they can be crawled very easily. I think it could work nicely with write once files (i.e. no appending).

Maybe a folder structure like below could be implemented? S3 partitions by the key prefix (folder names) so this kind of structure would be better for performance and would make it easier to search by date.

logs/
  2022/
    07/
      13/
        10-50-55_8a8b90ead596.json
        10-51-01_f5fbc89e83e6.json

danoliver1 avatar Jul 13 '22 10:07 danoliver1

SGTM, @danoliver1 👍

anderseknert avatar Jul 19 '22 08:07 anderseknert

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

stale[bot] avatar Aug 30 '22 22:08 stale[bot]

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

stale[bot] avatar Apr 07 '23 09:04 stale[bot]