Implement semgrep for Adoptium repositories
This was a suggestion that came from our external audit team, who generated some custom rules for the semgrep tool for us while they were working with is. There is a recent blog which describes some of their rules at https://blog.trailofbits.com/2024/01/17/30-new-semgrep-rules-ansible-java-kotlin-shell-scripts-and-more/
In the interests of improving our security it would be beneficial to add these as actions on each checks. This will involve implementing semgrep - probably as a GitHub action for convenience alongside our other checks. This would allow us to ensure that going forward we do not introduce any additional issues in the same areas as those already identified. This is a follow-on to the work which documented our existing set of checks in https://github.com/adoptium/infrastructure/issues/2502.
We should start by introducing this on the build or infrastructure repositories and then look at using it more widely to other repositories, including this which were not included in the scope of the audit. I suspect we'll need to do a bit of filtering on the default output before this will be suitable for deployment as a GitHub action check, but this is the best time - after we've done a clearup - to look at implementing it.
Ref: https://semgrep.dev FAQ with license info: https://semgrep.dev/docs/faq/#how-are-semgrep-and-its-rules-licensed
Noting also that while it's not necessarily the first thin you'll find in the docs, semgrep scan is the best way to get started instead of the semgrep ci command (which expects you to login to get access to things.
Here is the getting started in the CLI guide and something like this with the extra rules should do the trick:
semgrep scan --config /path/to/rules /path/to/code
I've run semgrep scan using the trailofbits rules against the infrastructure repository, and it highlights the outstanding issues that are already known/documented/mitigated. I've also successfully trialled adding a GitHub Action to run Semgrep with the same rulesets on PRs.
Similarly I have done the same for the temurin-build repository with similar results.
I propose adding the Semgrep PR scanner to both of these repositories.
Continuing to investigate adding it to other repositories.
PR For Infrastructure Repo: #3429
PR For Build Repo: adoptium/temurin-build#3688
PR For ci-jenkins-pipelins : adoptium/ci-jenkins-pipelines#1034
PR For installers : adoptium/installer#908
PR For aqa-tests: adoptium/aqa-tests#5343
PR For vdr-generator: adoptium/temurin-vdr-generator#21
PR for api: adoptium/api.adoptium.net#1041
PR for jenkins-helper: adoptium/jenkins-helper#61
Following on from the community meeting, clarification is being sought from EF INFRA SEC on implications of using rules with this license, so work on this issue is being temporarily suspended.
In addition to the above, the action already in place on the infrastructure repository will be moved to the .github central action repository, and the existing repo specific checks, amended to pick up the central one in a fashion similar to that used by the code-freeze bot.
PR to centralise semgrep action following approval.. https://github.com/adoptium/.github/pull/110
Following on from the community meeting, clarification is being sought from EF INFRA SEC on implications of using rules with this license, so work on this issue is being temporarily suspended.
In addition to the above, the action already in place on the infrastructure repository will be moved to the .github central action repository, and the existing repo specific checks, amended to pick up the central one in a fashion similar to that used by the code-freeze bot.
Update on the above: @tellison confirmed with EF infra sec/legal that we are ok to use Semgrep.
The github action workflow file has been centralised in the .github repository, and the infrastructure and build repositories amended to use this.
Semgrep rolled out to all key repositories.