fedcode-next: Code pipeline and models to continuously automatically collect fix commits
We should have a code pipeline and models to continuously automatically collect commits and patches that introduce or fix a vulnerability to support reachability analysis. There is already some base that analyses references. Here we need to dig deeper and scout the commits logs, changelogs and issues logs to discover and bisect if needed to find the subset of of code changes that we care for.
Today we can detect fix commits based some explicit references to commits, these are not always correct. We could validate the fix commits we have already
We have multiples issues that need to be triaged and "defragmented". We need one issues with only the usable research/projects to:
- [ ] #2000
- [ ] #2002
- [ ] #2001
- [ ] #2003
Hi @pombredanne! as discussed over the call I want to work on this.
but before diving into this I want to contribute to a quickie good first issue on the similar track.
Some issues of interest:
- https://github.com/aboutcode-org/vulnerablecode/issues/1129
- https://github.com/cve-search/git-vuln-finder
IMO we should treat fix commit data as advisory, but special advisory. As brought up by @keshav-space we can accomodate the changes in impacted package data model as well. Thanks!
I agree with that we should treat fix commits as advisory and may be avoid creating Codefixv2 entries directly relying on the CollectFixCommitsPipeline to create a Codefixv2 and associate them with the impacted package data model.
but this will limit our abilities to detect/store fix commit that is no related to any aliases as some developer just fix a vulnerabilities without creating a CVE but I think this is out of scope for now, especially since many of these cases are false positives.
IMO we should start with simple pipeline that parse git logs from key repositories linux / django
using regular expression searching for CVE-xx, GHSA-xx or XSA-xx and store them as advisory with some references
For example:
- https://github.com/torvalds/linux/commit/51ac8893a7a51b196501164e645583bf78138699
- https://github.com/django/django/commit/0b42f6a528df966729b24ecaaed67f85e5edc3dc
this will generate a really interested fix commits we are really missing vulnerablecode
This PR is ready for review:
- https://github.com/aboutcode-org/vulnerablecode/pull/1992 ... it parse fix commits from the Git commits