streamalert icon indicating copy to clipboard operation
streamalert copied to clipboard

[improvement] Have metrics for time-outs and out of memories

Open 0xdabbad00 opened this issue 6 years ago • 0 comments

Background

See discussion in the thread at https://streamalert.slack.com/archives/C3BHE2Z0S/p1559152670016000

In speaking with @ryandeivert there, certain errors appear only as counts in the error metric, but it's difficult to identify the cause behind these. You can have a metric filter on the phrase Task timed out for time-outs and Process exited before completing request for OOM. AWS has not provided a metric for these or a dimension on the errors. Ryan mentioned "thinking about it now, we should probably just add these to streamalert by default."

Are you on the latest version of StreamAlert? Yes

Steps to Reproduce

Have one of the lambda's time out or run out of memory. It will result in an error, but no indication via a metric on why this occurred, so if you had 100 errors, you don't know if all of them are time-outs, or OOM, or a split between the two, or something else.

Desired Change

Have a way of knowing when you've had time-outs or out-of-memories, beyond just knowing an error occurred.

0xdabbad00 avatar May 30 '19 18:05 0xdabbad00