cdk-nag feat(nag-pack): enable extensibility of rule application

[edited to remove 2nd commit] Currently cdk-nag only supports "blocking" validation scenario, in which cdk synth will fail the build if any Error level messages have been reported (unless --ignore-errors is passed), and Warning level is strict mode. Additionally, the detailed validation results are serialized as a single message string value, which prevents ability to utilize the granular details for other forms of reporting/tooling outside of cdk-nag.

My team is looking at building automated tooling (such as security threat models, matrixes, event flow, diagrams, etc) and we want developers to have early notification of security/compliance vulnerabilities while rapidly developing applications, but we don't want to block them. If --ignore-errors is enabled, then all native cdk validations errors would also be ignored (which could result in broken deployment). Ideally developers/tooling can enable/disable just cdk-nag from blocking the synth while still providing messaging/reporting/etc to users/tooling.

This PR provides the following:

1: Make NagPacks more extensible and controllable by splitting out the functionality of `NagPack.applyRule`

By splitting this out, tooling could overwrite/extend the specific functionality necessary without rewriting the core rule logic.

Example: extend an existing NagPack and overwrite the annotate method to prevent blocking builds. Extend the reportNonCompliant function to generate a custom report, etc.

Having distinct methods for each action makes it easier for developers to extends packs to generate additional forms of reporting without duplicating the core logic.

This change is non-functional change; simply a refactor to support more extensibility and control

Oct 06 '22 09:10 JeremyJonas

Thanks for the detailed write up! As you noted these are three separate changes and I think they should be addressed in different PRs.

I'll do a more detailed review later, but I have some general comments

extend an existing NagPack and overwrite the annotate method to prevent blocking builds

From what I understand the intent is to make it so that developers can retroactively fix violations after finishing development/prototyping while still getting reporting. I acknowledge that developers can use --ignore-errors, but I think that's a consequence of cdk-nag using Aspects, not something that should be built into the existing functionality. Just changing the rendering of Annotations doesn't change whether a rule is considered an Error or Warning according to the pack.

Suppression are a way to document in code the reason as to why the Application developer wants to ignore a violation. Maybe they'll get to it later or they don't agree with it. This seems like a way to get around using the suppression system.

Oct 06 '22 19:10 dontirun

HI @dontirun, appreciate the quick response. Regarding you questions:

I acknowledge that developers can use --ignore-errors, but I think that's a consequence of cdk-nag using Aspects, not something that should be built into the existing functionality.

This is actually a side effect of using Annotations.addError|Warning and not part of Aspects. The Aspects simply provides visitor pattern functionality.

Annotations.addError

Adds an { "error": } metadata entry to this construct. The toolkit will fail deployment of any stack that has errors reported against it.

Annotations.addWarning

Adds a warning metadata entry to this construct. The CLI will display the warning when an app is synthesized, or fail if run in --strict mode.

Annotations simply write metadata entries with native cdk types cxschema.ArtifactMetadataEntryType.ERROR|WARN|INFO which is later mapped to artifact messages during synth as SynthesisMessageLevel messages which in turn are processed when synth finishes and throw errors - which blocks the synth.

So by separating out the Annotations.addError|Warning code in cdk-nag applyRule, a tools developer could overwrite just this functionality to either not add annotations, or change the metadata type to something else. Either way, the compliance report would still be written and give developers feedback on what needs to be fixed later (before moving to production).

Suppression are a way to document in code the reason as to why the Application developer wants to ignore a violation. Maybe they'll get to it later or they don't agree with it. This seems like a way to get around using the suppression system.

Yes, this is a way to get around using the suppression system. But with the intent to only do so during "development phase", with a later "production phase" that would enforce the rules.

In development, especially for prototypes, developers are still rapidly trying out different approaches to a solution and can completely pivot from one approach to the next. During this development, we want to enable developers to not get blocked from deploying and continuing to iterate, but we do want to notify them of security/compliance concerns right away. By requiring developers to add suppressions to unblock development takes time and adds friction, especially if a it is replaced later with better approach.

Example:

class ExtendedAwsSolutionPack extends AwsSolutionPack {
  protected annotate(
    type: 'Info' | 'Warning' | 'Error',
    message: string,
    resource: CfnResource
  ): void {
    if (process.env.NODE_ENV === 'development') {
       // non-blocking metadata entry
       resource.node.addMetadta(`Nag${type}`, message)
    } else {
       // default blocking behavior
       super.annotate(...);
    }
  }
}

The use case is that a development team could create tooling that automatically adds the necessary NagPacks but with extended functionality like example above, and then based on the teams process control what is blocking/not-blocking based on the phase of development or some other heuristics.

The concern with this would be disabling non-negotiable rules (such as public S3 bucket policies) that you never want to be deployed, even in development phases. That responsibility will need to be up to the tools development to ensure, but for our use case we are looking at adding additional "severity" or "categories" to our rule pack, which would enable the tool to allow non-blocking development deployments of non-critical severity rules.

Hope that all makes sense and addresses your questions.

Oct 06 '22 23:10 JeremyJonas

@dontirun Regarding comment about 3 separate PRs

As you noted these are three separate changes and I think they should be addressed in different PRs.

There are 2 changes in this PR, and they are clearly separate in commits for review readability; I hope this is sufficient.

https://github.com/cdklabs/cdk-nag/pull/1036/commits

Oct 06 '22 23:10 JeremyJonas

@dontirun Regarding comment about 3 separate PRs

As you noted these are three separate changes and I think they should be addressed in different PRs.

There are 2 changes in this PR, and they are clearly separate in commits for review readability; I hope this is sufficient.

My mistake, it's 2 features. Different features will need to be handled in different PRs so that we can have distinct commits into the main branch in case we need to revert features. It also makes it easier to track documentation and other changes for each commit.

Oct 06 '22 23:10 dontirun

Yes, this is a way to get around using the suppression system. But with the intent to only do so during "development phase", with a later "production phase" that would enforce the rules.

In development, especially for prototypes, developers are still rapidly trying out different approaches to a solution and can completely pivot from one approach to the next. During this development, we want to enable developers to not get blocked from deploying and continuing to iterate, but we do want to notify them of security/compliance concerns right away. By requiring developers to add suppressions to unblock development takes time and adds friction, especially if a it is replaced later with better approach.

I understand that this can add friction, but I think that the suppression system should be the only to disable specific rules for a given pack. If the pack considers something blocking, then it should be documented in code when suppressing the blocking violation. The system is in place to consider potential violations before deployment.

Oct 06 '22 23:10 dontirun

@dontirun I have updated the PR to only include the 1st commit (just the refactor and not the metadata)

Oct 07 '22 00:10 JeremyJonas

@dontirun

I understand that this can add friction, but I think that the suppression system should be the only to disable specific rules for a given pack. If the pack considers something blocking, then it should be documented in code when suppressing the blocking violation. The system is in place to consider potential violations before deployment.

A good example of "development" vs "production" in which you would want to support different deployment blocking behavior is with encryption, specifically with KMS Customer Managed Keys. For production we want to enforce the best practice of using unique customer managed KMS keys for each resource, but this takes considerable development effort to implement. During development phase, we want to inform developers that before production they must implement this, but we don't want to require them to do so at this time, nor do we want them to "suppress" this because the intent is to do it before production. During early development it is fine to not encrypt test data (non-production data) via custom KMS, but it is not ok for production.

If we use the suppression functionality, the intent changes. Suppression should be used against "production" target and with the intent to "not fix in future".

Currently the only way to do this is to use --ignore-errors, which ignores all errors, not just cdk-nag errors.

Oct 07 '22 00:10 JeremyJonas

@dontirun Any further thoughts on this?

The aim to not change the intended behavior of cdk-nag and NagPack for general purpose (and the PR does not modify any functionality), the intent of the PR is just to enable tooling to utilize cdk-nag in more comprehensive development lifecycle scenarios by supporting low-level control of reporting.

Definitely open to suggestions in how to achieve this differently.

Oct 10 '22 22:10 JeremyJonas

I mainly have two concerns with this

This de-syncs the output in the console and the what is generated in reports
This makes it trivial for users to extend and change the behavior of packs while keeping the same pack name

Oct 10 '22 22:10 dontirun

@dontirun Definitely understand your concerns and they are good points. Here are my 2¢, and potential proposal to improve.

This de-syncs the output in the console and the what is generated in reports

Agree with this, but I believe this concern lies with responsibility of the developer whom is extending cdk-nag this way. As a tools developer looking to extend cdk-nag in this way, it would be my responsibility to ensure adequate messaging and safe guards are in place within the context of my changes and intent of tooling.

Example, outputting all the errors/warnings in the console directly (console.error()) rather than via Annotations, and adding messaging to developer that this id "development" phase and bypasses the blocking nature of the build.

Perhaps using a LoggingTarget pattern would be better for this, with an AnnotationTarget and ConsoleTarget, with AnnotationTarget being the default. And NagPack props can take in a target prop to specific this? That way it would enforce that the messages are being written to the console, but in the case of ConsoleTarget it would be non-blocking.

Does something like that address you concerns in this area?

This makes it trivial for users to extend and change the behavior of packs while keeping the same pack name

This can be done now by overriding applyRule on a NagPack. Yes it takes more code to just just disable Annotations this way, but a simple copy/paste would suffice with some comments to disable annotations.

I have seen developers suppress cdk-nag messaging by hacking the Annotations.of() to reach the same non-blocking aim of this PR, which is easy enough but much worse IMO:

// Really bad example of disabling blocking nag rules
class ExtendedAwsSolutionPack extends AwsSolutionPack {
  public visit(node: IConstruct ): void {
     const _AnnotationsOf = Annotations.of;
     // prevent blocking annotations by cdk-nag
     Annotations.of = (node) => ({ addError: () => {}, addWarning: () => {}, addInfo: () => {} }

     super.visit(node);

     Annotations.of = _AnnotationsOf
  }
}

Maybe the right approach is to add specific flag (eg: disableConsole) to make this more clear; but I think having more control rather than holistically disabling is still beneficial for extending cdk-nag by other tooling.

Regarding LoggingTarget pattern, it might be good to actual rewrite both Annotation (console) and "reports" handling to use this pattern (AnnotationTarget, ConsoleTarget, ComplianceCsvTarget, etc). There could be multiple nag targets (including developers could create their own), and cdk nag would fan out a standard log report structure to the loggers which would produce any time of logging. That would be very flexible and likely address both our concerns.

Similar to winston logging transports
By default the AnnotationTarget and ComplianceCsvTarget would be defined

Oct 11 '22 04:10 JeremyJonas

Perhaps using a LoggingTarget pattern would be better for this

I think the Target Pattern may be excessive in this case and doesn't feel like it would be a great user facing API. I think users care about the binary decisions here (reports/no reports, blocking/non-blocking messages). Additionally, since everything cdk-nag produces is a cycle of an Aspect (a line in the csv report, a singular annotation), the added complexity and confusing API wouldn't grant anything more than changes in filenames/locations.

Maybe the right approach is to add specific flag (eg: disableConsole) to make this more clear;

I don't think disabling console is the correct approach, but I think a flag based approach of blockingErrors (set to true by default) is probably the best way to go about this. When set to false, we still indicate the Error type in the console. This would allow for the functionality without altering the intended messaging from a Pack.

Oct 11 '22 15:10 dontirun

cdk-nag cdk-nag copied to clipboard

feat(nag-pack): enable extensibility of rule application

1: Make NagPacks more extensible and controllable by splitting out the functionality of NagPack.applyRule

cdk-nag
cdk-nag copied to clipboard

1: Make NagPacks more extensible and controllable by splitting out the functionality of `NagPack.applyRule`