opa
opa copied to clipboard
Export Decision Logs to S3
What is the underlying problem you're trying to solve?
I would like to export the decision log history directly into an S3 bucket. This would allow decision logs to be queried using AWS Glue.
Describe the ideal solution
This is my example config which I hoped would work but it seems S3 can only be used to pull bundles currently.
services:
s3:
url: https://my-bucket-name.s3.eu-west-2.amazonaws.com/
credentials:
s3_signing:
web_identity_credentials:
aws_region: eu-west-2
bundles:
authz:
service: s3
resource: policies/default.tar.gz
polling:
min_delay_seconds: 60
max_delay_seconds: 120
plugins:
envoy_ext_authz_grpc:
addr: :9191
path: example/ingress/allow
dry-run: false
enable-reflection: false
decision_logs:
service: s3
console: true
The ideal solution would be as above. E.g. set decision_logs.service
to the S3 service, and uploading to S3 would be automatically handled behind the scenes.
Additional Context
In my example, the service is already configured to pull bundles from S3 so uploading the logs to the same bucket would be ideal. As a workaround I could create a service to receive the decision logs via HTTP and upload to S3 but this seems like unnecessary extra complexity.
Thanks for filing this @danoliver1!
I'm curious about what the implementation would look like. Decision log uploads normally happen quite frequently, and with many running OPA instances for a given service, you'd see a lot of these log files being uploaded. Would each upload need a unique name, or is there some functionality to append to files in S3?
Since we'd potentially have many OPA's writing to the same bucket, we can't just increment some number to each upload, but would need some naming scheme like opa-$hostname-$random-uuid.tar.gz
, and it would be the job of the log collector to read through all of the uploads and try to order the events based on the contents? I haven't worked with AWS Glue, so maybe it provides all the answer to this :)
If these are all noob questions/concerns, I'd be happy to learn something new! But either way, I think we should try to include a solution proposal for how this would work in practice.
The quantity of files wouldn't be an issue as they can be crawled very easily. I think it could work nicely with write once files (i.e. no appending).
Maybe a folder structure like below could be implemented? S3 partitions by the key prefix (folder names) so this kind of structure would be better for performance and would make it easier to search by date.
logs/
2022/
07/
13/
10-50-55_8a8b90ead596.json
10-51-01_f5fbc89e83e6.json
SGTM, @danoliver1 👍
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.