hagl
hagl copied to clipboard
Add CodeQL Workflow for Code Security Analysis
Summary
This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.
What is CodeQL
CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.
For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).
What this PR does
We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
- Runs on every pull request (functionality to run on every push to main branches is included as a comment for convenience).
- Runs daily.
- Excludes queries with a high false positive rate or low-severity findings.
- Does not display results for git submodules, focusing only on our own codebase.
Validation
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.
Using the workflow results
If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:
- Under the repository name, click on the Security tab.
- In the left sidebar, click Code scanning alerts.
Is this a good idea?
We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.
We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205
Review of engineering hazards
License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:
Here's what you may also do with the Software, but only with an Open Source Codebase and subject to the License Restrictions provisions below:
Perform analysis on the Open Source Codebase.
If the Open Source Codebase is hosted and maintained on GitHub.com, generate CodeQL databases for or during automated analysis, CI, or CD.
False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.
@gek169 You responded to another PR regarding the automation of the PRs. We've only raised PRs for repositories that we've been able to fork and test CodeQL on on a repo-by-repo basis.
Regarding the automatically generated code, I'll reiterate that this is the default workflow that Github provides you with if you wanted to create a CodeQL Analysis workflow under GitHub actions. It has been modified to support a build script, a script that forces the workflow to fail if there are errors, and filters to remove rules or directories for scanning.
I reiterate that this PR contains neither spam (which by definition is useless, rather than useful) nor malware.
I have replied to the concerns of @gek169 at greater length here.
Do you have a link to a repository which uses CodeQL where I could see how the results look like?
@tuupola You can look under the Security tab of this repo under Code scanning alerts to look at the results of the CodeQL Analysis from this PR if you wanted to see the results on hagl
.
CodeQL results are not visible to anyone but the repo maintainers.
@b4yuan There is nothing there. Apparently this PR first has to be merged to see the output.
While I do believe this PR has been done with good intentions let me tell how it looks for an open source maintainer.
Problem #1. A new tool is introduced without discussing first. Additionally it takes considerable amount of time and effort to actually understand what the tool does. An example output would be a minimum. Instead I am now directed to read a marketing site and documentation site.
https://codeql.github.com/ https://codeql.github.com/docs/
Maybe you could provide an example repository with successful output and failed output of a CodeQL run?
Problem #2. The organisation where the PR originates has forked 275 repositories and anonymous users (without real name) are submitting the same PR to these repositories. This behavior is suspicious and spammy.
https://github.com/IVOES
GitHub is spammy these days. Bad actors are trying to hijack libraries. You should atleast make the PRs using a non anonymous account.
@tuupola Thanks for your feedback, we appreciate it.
Problem #1: Agree, these could be improved. Though I'm curious if you can explain more about the difference you perceive between opening an issue (for discussion) vs. raising a PR (where discussion can still occur, but in the context of the code contribution)?
Problem #2:
- Anonymous account: The student's name is Brian Yuan and his GitHub account is @b4yuan. That doesn't seem any more anonymous than @tuupola?
- Forked 275 repositories: Well, we studied how well CodeQL performed on many repositories and wanted to share the tool with people since we found it works pretty well. Do you think this is inappropriate? I'd love to know an alternative approach you would have found more preferable -- academics tend to study open-source software in aggregate and so we end up with hundreds of repos that might benefit from the results of ap project.
@tuupola To add, here is an image of our forked repo of hagl
and the CodeQL results in the Code Scanning section under Security:
Regarding 'failed output'---as long as CodeQL is configured properly, there really isn't 'failed output'. Maybe 'bad output', like false positives, but there are options to dismiss an error/warning for different reasons:
It seems I forgot to answer to this.
Anonymous account: The student's name is Brian Yuan and his GitHub account is @b4yuan. That doesn't seem any more anonymous than @tuupola?
You might know the students by name. The rest of us do not. Here are screenshots taken when this pr was opened. I think the difference is quite obvious.
vs