PFR - option to define & exclude dedicated resources/properties from CFN Drift Detection - e.g. due to AWS::EC2::SecurityGroupIngress references
Name of the resource
AWS::EC2::SecurityGroupIngress
Resource name
No response
Description
Summary & Starting point:
This idea is more a generic one - there are different circumstances, where CloudFormation Drift Detection reporting a DRIFT, but it's not - reasons could be:
- (1) You have some "Cross-Stack-References", like a
AWS::EC2::SecurityGroupIngressin Stack-B in with reference to a (Security)GroupId which is deployed in Stack-A as discussed here: #1198 - Overall, once you are facing a limitation of CloudFormation Drift Dectection. - (2) Your stack has a many different resources (e.g. while using the CDK-construct pattern) and only of of them facing a Drift Detection false-positive bug [If you like to get a idea of some examples, check inside the repo.]
After brainstorming with different people in CloudFormation area (e.g. @kanitkah @ziarrdan @LariWo), we thought of "exclusion list", which should be part of the CloudFormation template. By doing so, we could add this "exclusion-list for resources inside the stack, which should not be checked" to our version controlled CFN/CDK projects/files.
Since the benefit for case (2) is clear - Removing the resouce from list, after the github issue with the false-postive is solved. I'd like to add an example for case (1) here based on #1198:
-
Our Stack-A having a SecurityGroup (e.g. of an EC2 with a legacy database, later called as Logical ID
DatabaseSecGroupIngressStaticInSync). All resources and stacks of this Team-A are not dynamically. Once deployed Team-A taking care of Stack-A. -
There is a Team-B. This team utilize more cloud-native patterns, hence they create there ECS container-cluster only on request and drop this infrastructure (+ stacks, etc) after the workload is done and not required anymore.
-
There is a "contract" between Team-B and Team-A: In order to allow the Team-B's ECS components to access the database, Team-A defined a dedictated SecurityGroup for them (later named with Logical ID
ThisSecGroupIsAlwaysDrifted) and inform on the Physical ID (likesg-666xyz), where Team-B can "inject" Ingress rules to access the database by defining aAWS::EC2::SecurityGroupIngressin TeamB's CFN/CDK. Team-A helps this pattern as well, since Team-B doesn't need to request new rules everytime once ECS is active and later removed again. -
As we can see, Team-A encapsulate the access from Team-B using a dedicated SecurityGroup. The main reason is caused by issue #1198. Once Team-A checking there CloudFormation Stack (here called Stack-A) for Drifts, their SecurityGroup is always in state DRIFTED (due to #1198).
# This is "Stack-A" managed by "Team-A"
AWSTemplateFormatVersion: 2010-09-09
Parameters:
VpcId:
Type: String
Description: 'Enter the VPC ID where the Security Groups will be created'
Resources:
# also other stuff, like EC2, ENI, Volumes etc here ...
DatabaseSecGroupIngressStaticInSync: # Physical ID: sg-123abc
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: SecGroup for all static/normal Ingress Rules to database of Team-A
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 1521
ToPort: 1521
CidrIp: 192.168.0.1/32
# and many more ...
ThisSecGroupIsAlwaysDrifted: # Physical ID: sg-666xyz
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: SecGroup which Team-B will refer via AWS::EC2::SecurityGroupIngress from their template
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: "-1"
CidrIp: 127.0.0.1/32
Description: dummy rule to create an empty SecGroup for Team-B
Ideas for this PFR:
- We should be able to add within the the CloudFormation template (Stack) following definitions:
- (I.) Exclude resource
AWS::EC2::SecurityGroupwith Logical IDThisSecGroupIsAlwaysDriftedfrom Drift Detection. - (II.) Exclude only dedicated properties for all resources of this type within this stack, e.g. due to a known, temporarily false-postive issues in Drift Detection.
- (I.) Exclude resource
- Conversely,
- Other resource of the same type, like Logical ID
DatabaseSecGroupIngressStaticInSyncshould still be part of Drift Detection. - Further resource & properties of other of types (AWS::RDS::DBInstance, EC2::Instance) are also checked by Drift Detection.
- Other resource of the same type, like Logical ID
- Any pattern from this PFR must be compatible with CloudFormation managed and generated by CDK.
Benefits from this PFR:
- You are able to maintain know Drift Detection issues within your IaC. By having this new functionality the number of stack in state drift will be significantly reduced. Furthermore everybody can "read" within the CloudFormation template (along with comments with github links or internal notes, etc) why this drift happend (Bug, Limitation, Own use-case which require changes outside of CloudFormation via Console, etc.) and it's permanently excluded from Drift Detection.
- Please note this should be a generic pattern all over CloudFormation - My example with Team-A/Team-B using AWS::EC2::SecurityGroupIngress is just one (real-life) use-case. There are many more ;)
Other Details
Happy to discuss further ideas or difficulties for such pattern - here via github or via CloudFormation Discord https://discord.gg/9zpd7TTRwq :)
The use case for this PFR has multiple use cases. Especially if you're interested in making an EC2 Security Group in one Stack, and then you want to apply the Security Group Ingress or Egress rules on the EC2 Security Group in another Stack.
The EC2 Security Group would shows drifted since the limitation highlighted here
Drift is the most consistent way of ensuring that the CloudFormation Stack Resources are in a Nominal state.
You can see the necessity for this feature with issues also relating to the AWS::RDS::DBInstance Resource if you mark the "AutoMinorVersionUpgrade" Property Boolean as True. As soon as a Minor Version is released, the CloudFormation Template would need to be updated since the Engine Version difference causes a drift.
While the AWS::RDS::DBInstance issue can be solved via a Transform, it would be more ideal to highlight either a specific property within a Resource or the Resource itself to be exempted from Drift Detection. It would make Drift Detection for RDS Resources significantly more informative and viable to leverage.
Ideally, I would love to see a development of a common CloudFormation Attribute similar to the DependsOn Attribute where you specify which properties would and would not be exempted from Drift detection, or just declare "All" to remove the Resource from being considered in Drift Detection.
With the amount of problems that Drift Detection runs into regarding edge cases like the ones highlighted in my comment, it negatively impacts confidence within the Drift Detection feature. If a new way of defining what is and isn't detected within a Resource when running Drift Detection is developed, it would effectively be able to breathe new life into the usability of Drift Detection with this requested feature giving organizations the ability to start defining what drift they would like to actually detect and ignore.
For case 2, pushing that work onto every affected CloudFormation drift detection user feels like a crutch to mitigate a lack of better centralized drift solutions
It's been years since I worked on CloudFormation, but drift detection was much more reactive than proactive then. Customers complained about drift bugs, and then CloudFormation would investigate and react case-by-case
Instead, drift detection should monitor property types with high drift rates. Many customer-reported drift bugs had 100% drift rates; obviously, every single customer using that property type did not manually drift out-of-band. Drift detection for those property types should be automatically disabled for everyone and CloudFormation can asynchronously work through that list to re-enable them without customers being involved
Thanks Pat for your valuable input. I agree.
I'm adding an other example, where -individually- based on your personal use-case a exclude within CFN Drift Detection would be great:
It's regarding Drift Detection for resource AWS::SSM::Parameter Property Value
- In case you just create entries in Parameter Store via CFN or CDK you sometimes using a freely chosen initial Value of the Parameter (like Value "ChangeMe") in your CFN Template.
- Sometimes you doesn't like to store the exact Value in your code repository / version control, like git.
- It could be the case that your Lambda-Code store a ongoing changing Value inside the Parameter.
- It could be the case that other users like to change the Value according to their needs via AWS-Console.