Summary

This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.

What is CodeQL

CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.

For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

What this PR does

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that

Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
Runs daily.
Excludes queries with a high false positive rate or low-severity findings.
Does not display results for git submodules, focusing only on our own codebase.

Validation

To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Using the workflow results

If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:

Under the repository name, click on the Security tab.
In the left sidebar, click Code scanning alerts.

Is this a good idea?

We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.

We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205

Review of engineering hazards

License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:

Here's what you may also do with the Software, but only with an Open Source Codebase and subject to the License Restrictions provisions below:

Perform analysis on the Open Source Codebase.

If the Open Source Codebase is hosted and maintained on GitHub.com, generate CodeQL databases for or during automated analysis, CI, or CD.

False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.

Oct 28 '23 03:10 b4yuan

@decidedlygray Pinging to check in on a possible followup to this PR?

Nov 29 '23 21:11 b4yuan

@b4yuan I'm actually the person this project belongs to

I'll look into this, but please note this project isn't under active maintenance or development, so I'm not sure this is the best candidate for your research project.

Also note this project is explicitly intended to build and run on MIPS and Arm embedded Linux systems (though in theory it should be fairly portable) and your GitHub workflow doesn't seem to take that into account.

Nov 29 '23 22:11 zcutlip

If you've been able to run codeql on this independently and can give me some idea of what kind of findings it's producing that would help

I'd like to have some idea what I'm getting into before merging this PR. I don't want to merge and then the first time it runs it produces a bunch of false positives or other things I'm not able to fix.

Nov 29 '23 22:11 zcutlip

@zcutlip Thanks for your comments. We've already gone through a 'data collection' phase where we run CodeQL on >250 repositories, analyzed what kind of bugs show up, and how to avoid this. At this point, we are reaching out to devs with the configurations that we used so that they may personally benefit from a tool that has already been set up for them.

Also note this project is explicitly intended to build and run on MIPS and Arm embedded Linux systems (though in theory it should be fairly portable) and your GitHub workflow doesn't seem to take that into account.

CodeQL is a static analysis tool, and only analyzes source code. You can think of this as a glorified linter to preemptively

If you've been able to run codeql on this independently and can give me some idea of what kind of findings it's producing that would help

I'd like to have some idea what I'm getting into before merging this PR. I don't want to merge and then the first time it runs it produces a bunch of false positives or other things I'm not able to fix.

Firstly I want to point out that any CodeQL query that results in too many false positives can be filtered out in the workflow:

    # Filter out rules with low severity or high false positve rate
    # Also filter out warnings in third-party code
    - name: Filter out unwanted errors and warnings
      uses: advanced-security/filter-sarif@v1
      with:
        patterns: |
          -**:cpp/path-injection
          -**:cpp/world-writable-file-creation
          -**:cpp/poorly-documented-function
          -**:cpp/potentially-dangerous-function
          -**:cpp/use-of-goto
          -**:cpp/integer-multiplication-cast-to-long
          -**:cpp/comparison-with-wider-type
          -**:cpp/leap-year/*
          -**:cpp/ambiguously-signed-bit-field
          -**:cpp/suspicious-pointer-scaling
          -**:cpp/suspicious-pointer-scaling-void
          -**:cpp/unsigned-comparison-zero
          -**/cmake*/Modules/**
        input: ${{ steps.step1.outputs.sarif-output }}/cpp.sarif
        output: ${{ steps.step1.outputs.sarif-output }}/cpp.sarif

As for example output from our fork: (looks like there have been no bugs found!) On another repository... With these bugs, you can dismiss them for any reason: (or again, just simply filter out a rule)

I want to say that you can even open the Security tab on this repo and under Code scanning it should show the results of the run from the PR. I could be wrong here; from past experience some devs I've been in touch with have been able to and some haven't.

If you're interested enough, I'll point you to the manuscript again: https://arxiv.org/abs/2310.00205, we go a lot more in detail there.

Nov 29 '23 23:11 b4yuan

nvram-faker
nvram-faker copied to clipboard

Add CodeQL Workflow for Code Security Analysis

Summary

What is CodeQL

What this PR does

Validation

Using the workflow results

Is this a good idea?

Review of engineering hazards

nvram-faker nvram-faker copied to clipboard

Add CodeQL Workflow for Code Security Analysis

Summary

What is CodeQL

What this PR does

Validation

Using the workflow results

Is this a good idea?

Review of engineering hazards

nvram-faker
nvram-faker copied to clipboard