wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

Add CodeQL Workflow for Code Security Analysis

Open b4yuan opened this issue 2 years ago • 7 comments

Summary

This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.

What is CodeQL

CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.

For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

What this PR does

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that

  • Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
  • Runs daily.
  • Excludes queries with a high false positive rate or low-severity findings.
  • Does not display results for git submodules, focusing only on our own codebase.

Validation

To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Using the workflow results

If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:

  1. Under the repository name, click on the Security tab.
  2. In the left sidebar, click Code scanning alerts.

Is this a good idea?

We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.

We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205

Review of engineering hazards

License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:

Here's what you may also do with the Software, but only with an Open Source Codebase and subject to the License Restrictions provisions below:

Perform analysis on the Open Source Codebase.

If the Open Source Codebase is hosted and maintained on GitHub.com, generate CodeQL databases for or during automated analysis, CI, or CD.

False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.

b4yuan avatar Nov 23 '23 00:11 b4yuan

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

I want to clarify that fail_on_error.py is a script that forces the step to fail if an error is found within the code (which in turn makes the entire workflow seem like it failed). However, looking at the steps, the workflow successfully built wasm-micro-runtime, analyzed it, and uploaded the results.

b4yuan avatar Nov 23 '23 00:11 b4yuan

@no1wudi Pinging to check in on a possible followup to this PR?

b4yuan avatar Nov 29 '23 21:11 b4yuan

@b4yuan Thanks for your countribution, should the error be fixed ?

no1wudi avatar Dec 01 '23 07:12 no1wudi

@no1wudi The workflow is functioning properly; fail_on_error.py is a script that forces the entire workflow to fail if any errors are found in the CodeQL analysis:

image

b4yuan avatar Dec 01 '23 17:12 b4yuan

Get it, thanks.

no1wudi avatar Dec 02 '23 11:12 no1wudi

@b4yuan thanks for submitting the PR! It is really a good method to improve WAMR's code quality. IIUC, we should address the issues reported below before merging this PR? https://github.com/bytecodealliance/wasm-micro-runtime/security/code-scanning?query=pr%3A2812+is%3Aopen

wenyongh avatar Jan 18 '24 10:01 wenyongh

Merge this PR and will create another PR to enhance it: (1) add more compilation combinations to build many kinds of iwasm with different features enabled, (2) run the CI only in nightly-run, disable it when a PR is created, since it takes long time to run the build script.

wenyongh avatar Mar 21 '24 04:03 wenyongh

@b4yuan Thanks again for contributing this PR, it is really important to improve the code quality of WAMR project. I have merged this PR and submitted another PR (#3246) to enhance it and make it nightly run only. We will look into the issues reported then, and try to enable triggering CodeQL when PR is created in the future.

wenyongh avatar Mar 21 '24 06:03 wenyongh