vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Enhance code commit collection capabilities for VCIO

Open TG1999 opened this issue 1 month ago • 4 comments

  • For VCS URLs that can be formed into PURLs as of today, (github, bitbucket anf gitlab). We will support them for code commit collection
  • For the types that we don't support, we will not create CodeCommits for them today, instead store them as references for now as a fallback
  • We will also log these kind of commits
  • We will have an improver that takes references with the advisories associated and tries to create valid code commits
  • Later when improve our parsing capabilities in url2purl for various VCS URL types, this improver will handle the rest
  • We will also store the commit diffs in our models
fixed_by_commits = [CommitV2(commit_hash="abcd", vcs_url="github/...")] 
affected_commits = [CommitV2(commit_hash="abcd", vcs_url="github/...")] 
affected_packages = [ AffectedPackageV2( package=purl, fixed_by_commits=fixed_commits, affected_by_commits=affected_commits, ) ] 

yield AdvisoryData( aliases=[vuln_id], affected_packages=[affected_packages], references=sorted(references), date_published=date_published, url=self.data_url, )
  • This is the valid design
  • We will introduce, fixed_by_commits, introduced_by_commits
  • And for an affectedPackage dataclass object to be valid, it needs to have either of one affected_by_commits, fixed_by_commits, affecting_range or fixed_by_range.

Reference:

  • https://github.com/aboutcode-org/vulnerablecode/pull/2017
  • https://github.com/aboutcode-org/vulnerablecode/issues/1721
  • https://github.com/aboutcode-org/vulnerablecode/issues/2010

TG1999 avatar Nov 14 '25 15:11 TG1999

  • commit rank should be in unique together
  • get rid of author, message and date
  • commit_patch = models.TextField( null=True, blank=True, help_text="patch content of the commit." )
  • Rename CodeCommit model to CodePatch

TG1999 avatar Nov 18 '25 17:11 TG1999

  • commit_patch_hash = models.TextField( null=True, blank=True, help_text="patch content of the commit." )

We can introduce commit patch hash later

  • Just commit.
  • Just patch.
  • Both commit and patch.

We need to define a way to store unique contents for patches.

TG1999 avatar Nov 18 '25 17:11 TG1999

class Patch(models.Model):
    patch_url = models.URLField(blank=True, null=True)
    patch_text = models.TextField(blank=True, null=True)
    patch_checksum = models.CharField(max_length=128, blank=True, null=True) # sha512 of patch text

    class Meta:
        unique_together = ("patch_url", "patch_checksum")
  • One advisory can have many patches, and one patch can also be associated to many advisories (M2M)
  • A patch field needs to have either patch_url or patch_text
class PackageCommitPatch(models.Model):
    vcs_url = models.URLField()
    commit_hash = models.CharField(max_length=200)
    patch_text = models.TextField(blank=True, null=True)
    patch_checksum = models.CharField(max_length=128, blank=True, null=True)

    class Meta:
        unique_together = ("vcs_url", "commit_hash")
  • ImpactedPackages will have affected_by_package_code_patches and fixed_by_package_code_patches

ziadhany avatar Nov 25 '25 17:11 ziadhany

When we have a VCS URL & Commit hash & not code patch

  • Case 1: We can parse a package and commit - Create PackageCommitPatch with VCS URL and Commit Hash. Also store the patch URL with values of VCS URL & Commit Hash. Make a PURL from VCS URL and Commit Hash and then make URL from PURL.
  • Case 2: We can not parse a package - For the types that we don't support, we will not create PackageCommitPatch for them today, instead store them as Patch with patch URL for now as a fallback. We will also log these kind of URLs. We will have an improver that takes Patch with the advisories associated and tries to create valid PackageCommitPatches. Later when improve our parsing capabilities in url2purl for various VCS URL types, this improver will handle the rest

When we have a VCS URL & Commit hash & code patch

  • Case 1: We can parse a package and commit - Create PackageCommitPatch. Store the patch. Also store the patch URL with values of VCS URL & Commit Hash. Make a PURL from VCS URL and Commit Hash and then make URL from PURL.
  • Case 2: We can not parse a package (unsuported type) - For the types that we don't support, we will not create PackageCommitPatch for them today, instead store them as Patch with patch URL and patch text for now as a fallback. We will also log these kind of URLs. We will have an improver that takes Patch with the advisories associated and tries to create valid PackageCommitPatches. Later when improve our parsing capabilities in url2purl for various VCS URL types, this improver will handle the rest

When we do not have a VCS URL & no Commit hash & code patch

  • Store that as a Patch and relate it to the advisory

When we have a VCS URL & no Commit hash & no code patch

  • Case 1: We can parse a package and commit - Create PackageCommitPatch with VCS URL and Commit Hash. Also store the patch URL with values of VCS URL & Commit Hash. Make a PURL from VCS URL and Commit Hash and then make URL from PURL.
  • Case 2: We can not parse a package - For the types that we don't support, we will not create PackagePatch for them today, instead store them as Patch with patch URL for now as a fallback. We will also log these kind of URLs. We will have an improver that takes Patch with the advisories associated and tries to create valid PackagePatches. Later when improve our parsing capabilities in url2purl for various VCS URL types, this improver will handle the rest

When we have a VCS URL & no Commit hash & code patch

  • Case 1: We can parse a package and commit - Create PackagePatch. Store the code patch. Also store the patch URL with values of VCS URL & Commit Hash. Make a PURL from VCS URL and Commit Hash and then make URL from PURL.
  • Case 2: We can not parse a package (unsuported type) - For the types that we don't support, we will not create PackageCommitPatch for them today, instead store them as Patch with patch URL and patch text for now as a fallback. We will also log these kind of URLs. We will have an improver that takes Patch with the advisories associated and tries to create valid PackageCommitPatches. Later when improve our parsing capabilities in url2purl for various VCS URL types, this improver will handle the rest

TG1999 avatar Nov 25 '25 17:11 TG1999

Closed by

  • #2017

ziadhany avatar Dec 17 '25 09:12 ziadhany