dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Add cdk synth before running checkov security scans

Open zsaltys opened this issue 1 year ago • 1 comments

Currently the github action is using checkov to scan data.all repo. However it cannot find much because most of the stacks are generated with CDK.

We should generate the stacks using cdk synth before running checkov scans. I was able to do it on our own custom build environment which does not use github actions. What I had to do:

  • install cdk cli
  • install python dependencies: pip install -r ./deploy/requirements.txt
  • prepare some valid version of cdk.json pointing to a real environment: cp cdk.json.STAGING cdk.json
  • run cdk synth: cdk synth

The tricky bit is that cdk synth needs AWS credentials and it does connect to an actual account to check a few things. I've created a basic role template that is required to run cdk synth successfull cdk-synth-example-role.txt y.

There have been numerous findings reported by checkov so this is definitely worthwhile. Additionally Im thinking how we could generate templates for datasets and environments as these are created during runtime use of data.all

zsaltys avatar Jan 31 '24 12:01 zsaltys

Hi @zsaltys thanks for the issue. We faced similar issues when introducing CDK Nag to test the infra. All the alternatives are described in the #767 pull request. We went to the root of the problem which is the need for credentials to look up AWS resources. We pass the context object as a variable of cdk deploy based on the environment variable GITHUB_ACTIONS that is defined in the container of the GitHub action. We could use the same approach for checkov.

The second part of this issue is the synthesis of Environment, Dataset etc stacks for scanning. It is core infrastructure that should also be scanned. In this case instead of running cdk synth I propose to use cdk App and Template classes in a similar way that is used in the integration tests (example tests/modules/notebooks/cdk/test_sagemaker_notebook_stack.py or tests/modules/mlstudio/cdk/test_sagemaker_studio_stack.py) Then we can run checkov and CFN-Nag directly on the CloudFormation templates generated.

Happy to collaborate on these features

dlpzx avatar Feb 05 '24 13:02 dlpzx

@anmolsgandhi @mourya-33 @noah-paige updated this to mention reworking checkov scanning to use checkov baselines as I couldn't find any other open tickets for that.

zsaltys avatar Jul 26 '24 13:07 zsaltys