Providing multiple directories only checks the first one (multiple times)
We have several directories we want to check in a single Checkov run, either invoking directly e.g. checkov -d deploys1 -d deploys2 or with the configuration file, for example:
---
directory:
- deploys1
- deploys2
We are predominantly using the terraform framework but I suspect that's not relevant in this case. It seems that only the first directory provided is ever checked. To demonstrate this here are some invocations and their output:
$ checkov -d deploys1 --quiet
terraform scan results:
Passed checks: 154, Failed checks: 0, Skipped checks: 9
$ checkov -d deploys2 --quiet
terraform scan results:
Passed checks: 3274, Failed checks: 0, Skipped checks: 326
$ checkov -d deploys1 -d deploys2 --quiet
terraform scan results:
Passed checks: 154, Failed checks: 0, Skipped checks: 9
terraform scan results:
Passed checks: 308, Failed checks: 0, Skipped checks: 18
It appears that the first directory is added multiple times. I've tried adding a few different directories several times and after the first checked directory is complete, the rest of the directories are printed out quickly. Therefore I believe they are not really being checked multiple times but the results are being printed out multiple times (with some multiplication of checks / skipped checks in the mix).
The same behaviour can be observed whether you configure the directories on the command line or in the configuration file.
I am using Checkov 3.2.477 on MacOS.
I think I understand a little of the flaw that's going on here although I'm not super familiar with the Checkov architecture. I'm guessing the RunnerRegistry is trying to accumulate check results across multiple framework runners, but unfortunately it doesn't really handle multiple independent check runs across different directories. It needs to either clear or continue accumulating check results if there are multiple directories being checked in one execution of checkov.
The question is whether that's something that should be handled inside the RunnerRegistry or if it should only be aware of one root directory path being scanned, and have the state resetting etc happen outside it (i.e. in main.py). I have an implementation of this latter approach although it doesn't feel very clean. I'll await any maintainer feedback on that.
You're right that it's currently not supported. What you can do is specify the parent directory of the directories you'd like to scan and then use --skip-path for any paths you don't want scanned.
@maxamel thanks for that detail. Should it be supported? It's a bit weird that the configuration file directory directive takes an array, if multiple directories are not supported currently.
Is there interest in it being supported? I could raise a PR for it if so, although it feels more like a hotfix than anything else.
Hmm, somehow the skip-path method also doesn't quite work. It's unclear why at the moment:
$ checkov -d . --skip-path skip1 --skip-path skip2 --skip-path skip3 --skip-path skip4 --skip-path skip5 --skip-path skip6 --skip-path skip7 --quiet
terraform scan results:
Passed checks: 2118, Failed checks: 0, Skipped checks: 150
$ checkov -d deployments --quiet
terraform scan results:
Passed checks: 3303, Failed checks: 0, Skipped checks: 431
The first command should total something like 3500 checks and 440 skips.
@ohookins sure, you're welcome to raise a PR regarding the multiple directory issue. If there's a separate bug with skip-path (although I briefly checked it and didn't see issues) it would be better to open another issue with a concrete example that does not work.
Here's some code (admittedly not very good) that fixes the issue: https://github.com/bridgecrewio/checkov/pull/7334
Have raised the skip-path issue here: https://github.com/bridgecrewio/checkov/issues/7336