matano icon indicating copy to clipboard operation
matano copied to clipboard

Support lookup metadata from file/payload to enrich events for sources such as AWS ELB

Open shaeqahmed opened this issue 1 year ago • 1 comments

Problem

AWS ELB does not include AWS account ID in each event payload, this information is only included in the path e.g. aws-elb-logs/<account-id>/.... As a user, I would like to be able to query my AWS ELB logs using an AWS account ID field to filter/narrow down events.

Ideas

To support this in a generic way in our VRL transform, without impacting performance (requiring synchronization of threads in the hot path via a mutex) we would need to add a custom VRL function for looking up me (get_payload_metadata_field). We would also add a function like set_payload_metadata that could be used from the select_table_from_payload_metadata VRL expression to parse and populate some file level metadata that the corresponding events can lookup later. For example for AWS ELB this may look like (psuedoscript):

ingest:
    select_table_from_payload_metadata: |
        # inject additional metadata by parsing s3 key and extracting aws_account_id and adding it to the `__metadata` special field
        .__metadata |= parse_regex(.__metadata.s3.key, r'/AWSLogs/(?P<aws_account_id>\d{12})/elasticloadbalancing/') ?? {}

        # just return the default table
        "default"

Then from the transform we could use this info like:

transform: |
    .cloud.account.id = get_payload_metadata_field("aws_account_id")

This is a bit too complicated for my liking though and this is a pretty niche use case (only current applications are AWS ELB and Route53 potentially. Generally other sources do (and should) include important metadata in the event rather than relying on the bubbling up context from the path, so I'd like to hold off on implementing a solution for this until it becomes clearer it is worth it.

shaeqahmed avatar Feb 15 '23 06:02 shaeqahmed