streamalert Improvement: Keep Track of S3 Processing

Improvement: Keep Track of S3 Processing

Open jacknagz opened this issue 7 years ago • 0 comments

Background

When StreamAlert processes files from S3, there's a chance that the amount of parsed records could be very high (over 10k). In this case, the rule processor has the potential to Timeout during processing. Commonly, this is due to Firehose backing off too much while sending record batches, resulting in an Invocation Failure. Lambda then retries this same request, and sends the same batch of records back out repeatedly.

Steps to Reproduce

Configure the rule_processor to accept s3_events as input with very large files, and also have Firehose enabled.

Desired Change

The goal is to not process duplicate records from S3. This could be accomplished with something like this:

Create a DDB table, with the S3 object name as the primary key
Track how many alerts were sent, and if alert processing completed
Track how many records were sent to Firehose
When processing repetitive S3 objects, load state and resume where it left off

Jan 09 '18 22:01 jacknagz

streamalert streamalert copied to clipboard

Improvement: Keep Track of S3 Processing

Background

Steps to Reproduce

Desired Change

streamalert
streamalert copied to clipboard