AutoSpotting
AutoSpotting copied to clipboard
Inability to DescribeRegions does not cause panic
Issue type
Bug Report
Build number
Custom build of 9b438dc with no diff.
Environment
- AWS region: us-east-1
- Type of environment: VPC
Summary
The AutoSpotting Lambda logs errors when it fails to describe regions, but it does not panic. This means that unless you have something monitoring your logs for errors, AutoSpotting may silently fail (since it exits normally, AWS will not consider the Lambda invocation to be an error).
In other places AutoSpotting panics, which has the desirable effect of informing AWS that the Lambda invocation failed. However, AutoSpotting appears to be inconsistent with this behavior since elsewhere it does not panic.
Relevant code: https://github.com/AutoSpotting/AutoSpotting/blob/9b438dc47cbfd9587b042d5b2aaac2765c28a4e0/core/main.go#L39-L44
Steps to reproduce
Remove ec2:DescribeRegions
permission.
Expected results
Error count goes up. Alarms built off error count go off.
Actual results
No errors. Alarms do not fire despite the error being in the logs.
Thanks for reporting this issue, please create a PR with a fix.
@cristim The DescribeRegions
case is easy enough to fix, since there's no reason not to exit with an error code in that case. But what about errors that aren't necessarily fatal? For example, failing to describe a cloudformation stack. Presumably, such an error is not worthy of exiting immediately, instead we should continue on.
Is there a recommended way to monitor AutoSpotting for these kinds of errors? For example, are the logs consistent enough that one can monitor for the string "Failed"?
To be honest I don't think it's critical to handle/monitor these cases, so far it was enough to assume the IAM policy is correct, people are not supposed to change it.
But if you strongly believe it should be handled differently feel free to send a PR and I'll gladly accept it.
But I would like to learn more about your use case for these requirements.
But I would like to learn more about your use case for these requirements.
I'm setting up AutoSpotting in a multi-tenant environment where I don't want misconfiguration errors, e.g. the wrong TAG_FILTERS
, to mess with other ASGs. So I'm limiting the IAM permissions to specific resources. This means I have to use a custom CloudFormation stack, which means I may get things wrong 😉.
so far it was enough to assume the IAM policy is correct
Authorization errors are just one example. AutoSpotting could also run into rate limiting errors, AWS service limits, etc. I think those are worth monitoring, no?
I see, good luck with that project sounds like fun!
In a previous setup we used to monitor all these using a custom Splunk search executed over the CloudTrail logs. But AutoSpotting was always executed with the upstream permissions.
Once you're done I'd love if you could share how you did this, it might be useful for other folks as well.
hi @gabegorelick,
I'll close this for now but I'd love to have a chat in more detail about your use case of using AutoSpotting in such a multi-tenant setup.