vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

fedcode-next: Code pipeline and models to continuously automatically collect fix commits

Open pombredanne opened this issue 11 months ago • 6 comments

We should have a code pipeline and models to continuously automatically collect commits and patches that introduce or fix a vulnerability to support reachability analysis. There is already some base that analyses references. Here we need to dig deeper and scout the commits logs, changelogs and issues logs to discover and bisect if needed to find the subset of of code changes that we care for.

Today we can detect fix commits based some explicit references to commits, these are not always correct. We could validate the fix commits we have already

We have multiples issues that need to be triaged and "defragmented". We need one issues with only the usable research/projects to:

  • [ ] #2000
  • [ ] #2002
  • [ ] #2001
  • [ ] #2003

pombredanne avatar Jan 05 '25 21:01 pombredanne

Hi @pombredanne! as discussed over the call I want to work on this.

ArkaprabhaChakraborty avatar Jan 20 '25 18:01 ArkaprabhaChakraborty

but before diving into this I want to contribute to a quickie good first issue on the similar track.

ArkaprabhaChakraborty avatar Jan 20 '25 18:01 ArkaprabhaChakraborty

Some issues of interest:

  • https://github.com/aboutcode-org/vulnerablecode/issues/1129
  • https://github.com/cve-search/git-vuln-finder

pombredanne avatar Jul 18 '25 10:07 pombredanne

IMO we should treat fix commit data as advisory, but special advisory. As brought up by @keshav-space we can accomodate the changes in impacted package data model as well. Thanks!

TG1999 avatar Oct 11 '25 12:10 TG1999

I agree with that we should treat fix commits as advisory and may be avoid creating Codefixv2 entries directly relying on the CollectFixCommitsPipeline to create a Codefixv2 and associate them with the impacted package data model.

but this will limit our abilities to detect/store fix commit that is no related to any aliases as some developer just fix a vulnerabilities without creating a CVE but I think this is out of scope for now, especially since many of these cases are false positives.

IMO we should start with simple pipeline that parse git logs from key repositories linux / django using regular expression searching for CVE-xx, GHSA-xx or XSA-xx and store them as advisory with some references

For example:

  • https://github.com/torvalds/linux/commit/51ac8893a7a51b196501164e645583bf78138699
  • https://github.com/django/django/commit/0b42f6a528df966729b24ecaaed67f85e5edc3dc

this will generate a really interested fix commits we are really missing vulnerablecode

ziadhany avatar Oct 11 '25 13:10 ziadhany

This PR is ready for review:

  • https://github.com/aboutcode-org/vulnerablecode/pull/1992 ... it parse fix commits from the Git commits

pombredanne avatar Oct 15 '25 10:10 pombredanne