vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Migrate Vulnerability models to Advisory models

Open TG1999 opened this issue 9 months ago • 8 comments

Validate and deploy advisories dedupe

  • [x] Use same models for staging and production
  • [x] Take backup of production, copy to staging and restore on staging
  • [x] Deploy advisory dedupe, merged with https://github.com/aboutcode-org/vulnerablecode/pull/1795
  • [x] Run improver only to dedupe advisories
  • [x] Review that everything is okay and see if advisories are deduped (reduced). We had 119 million advisories earlier now we have 18 million advisories after running the dedupe pipeline
  • [x] And deploy on production

Add advisory ID

  • [x] Add advisory ID field to Advisory model, create schema migration
  • [x] Move url field position just below the advisory_id field.
  • [x] Add improver pipeline to populate advisory ID, each advisory created_by different importers implies a different treatment to determine the advisory ID in one of the aliases, the URL or the references.
  • [x] Update all importers and improvers to account for the new advisory ID field. (import_runner and improve_runner as well)
  • [ ] Test improver on staging and deploy on production

Add other fields ...

  • [x] Aliases: Create a new model for AdvisoryAlias, we migrate aliases from advisory models to the new models with improver. Ignoring the alias that are part of advisory ID. https://github.com/aboutcode-org/vulnerablecode/issues/1777
  • [x] Affected Packages: Create a relationship between a package and advisory and migrate
  • [x] References: Create AdvisoryReferences, and migrate
  • [x] Severities: Severities needs to be refactored. Create new advisory severities. So they do not go through references. WIll be like VulnerabilitySeverity but will be directly associated with an advisory
  • [x] Weakness: Create AdvisoryWeakness, and migrate.

Design how to relate to a vulnerability

Update API (v2) and UI.

  • #1882
  • #1883

Remove old models, old fields and old data.

QnA

  • [ ] How to decide advisory ID when all importers share exact same aliases. for example 2 importers only have alias: CVE-XXXX-YYYY, then what should be the heuristic? Ans: Advisory ID will not be a unique field, but will be part of a unique together: (url, advisory_id, created_by etc...)

  • [ ] Complete the migration and API on the basis of data models.

TG1999 avatar Mar 05 '25 12:03 TG1999

We are planning to use these respective aliases for these importers for advisory IDs for the advisories generated by these importers

Apache HTTPD - CVE Apache Kafka - CVE Apache Tomcat - CVE Arch Linux - CVE Curl - CVE Github - GHSA -> CVE Github OSV - GHSA -> CVE Gitlab - GMS -> GHSA -> CVE Debian - CVE Elixir - CVE EPSS - CVE Fireye - MNDT, CVE Gentoo - CVE Istio - CVE Mozilla - CVE OpenSSL - VC-OPENSSL, CVE Postgresql - CVE Xen - CVE Ubuntu USN - CVE Suse - CVE Ruby - CVE OSV - CVE -> GHSA -> PYSEC Redhat - CVE NVD - CVE NPM - CVE

TG1999 avatar Mar 06 '25 12:03 TG1999

https://github.com/aboutcode-org/vulnerablecode/issues/1796#issuecomment-2703689811 looks good

pombredanne avatar Mar 18 '25 10:03 pombredanne

  • [x] Add V2Advisory Model.
  • [x] V2Advisory Model should have relationships between other models like aliases, affected packages, references, severities and weaknesses.
  • [x] V2AdvisoyModel will have advisory ID. advisory ID will be a natural ID for example Redhat importer will have RHSA, NVD will have CVE, when there is no ID we will create one. For example openSSL
  • [ ] Refactor all importers to change according to the latest advisory model.

Make consistent changes and in smaller steps.

TG1999 avatar Apr 17 '25 09:04 TG1999

I was adding the models for AdvisoryDataV2, I have a question about what shall be our approach for storing AffectedPackages in our new models. Historically, we used to store affected packages' data like this in our Advisory.

package: PackageURL
affected_version_range: Optional[VersionRange] = None
fixed_version: Optional[Version] = None

Then we unfurl those version ranges using Improver and store them in these models.

class Package(PackageURLMixin):
    """
    A software package with related vulnerabilities.
    """

    # Remove the `qualifers` and `set_package_url` overrides after
    # https://github.com/package-url/packageurl-python/pull/35
    # https://github.com/package-url/packageurl-python/pull/67
    # gets merged

    affected_by_vulnerabilities = models.ManyToManyField(
        to="Vulnerability",
        through="AffectedByPackageRelatedVulnerability",
    )

    fixing_vulnerabilities = models.ManyToManyField(
        to="Vulnerability",
        through="FixingPackageRelatedVulnerability",
        related_name="fixed_by_packages",  # Unique related_name
    )

    package_url = models.CharField(
        max_length=1000,
        null=False,
        help_text="The Package URL for this package.",
        db_index=True,
    )
class FixingPackageRelatedVulnerability(PackageRelatedVulnerabilityBase):
    class Meta(PackageRelatedVulnerabilityBase.Meta):
        verbose_name_plural = "Fixing Package Related Vulnerabilities"


class AffectedByPackageRelatedVulnerability(PackageRelatedVulnerabilityBase):

    severities = models.ManyToManyField(
        VulnerabilitySeverity,
        related_name="affected_package_vulnerability_relations",
    )

    objects = BaseQuerySet.as_manager()

    class Meta(PackageRelatedVulnerabilityBase.Meta):
        verbose_name_plural = "Affected By Package Related Vulnerabilities"

class PackageRelatedVulnerabilityBase(models.Model):
    """
    Abstract base class for package-vulnerability relations.
    """

    package = models.ForeignKey(
        Package,
        on_delete=models.CASCADE,
        db_index=True,
        # related_name="%(class)s_set",  # Unique related_name per subclass
    )

    vulnerability = models.ForeignKey(
        Vulnerability,
        on_delete=models.CASCADE,
        db_index=True,
        # related_name="%(class)s_set",  # Unique related_name per subclass
    )

    created_by = models.CharField(
        max_length=100,
        blank=True,
        help_text=(
            "Fully qualified name of the improver prefixed with the module name "
            "responsible for creating this relation. Eg: vulnerabilities.importers.nginx.NginxBasicImprover"
        ),
    )

I was thinking to use the same thing in our new models, but now having some reservations that if we are going with an advisory only model, will this be our best approach since if we have a version range and not concrete versions how will someone query advisories? So what do you guys think ?

TG1999 avatar Apr 25 '25 09:04 TG1999

The style and type relationships we have between Vulnerability and Package should be carried forward to new relationship between Advisory V2 and Package

pombredanne avatar Apr 25 '25 09:04 pombredanne

@TG1999 what about something like this as a base:

class PackageV2(PackageURLMixin):
    """
    A software package with related advisories.
    """

    # Remove the `qualifers` and `set_package_url` overrides after
    # https://github.com/package-url/packageurl-python/pull/35
    # https://github.com/package-url/packageurl-python/pull/67
    # gets merged

    affected_by_advisories = models.ManyToManyField(
        to="AdvisoryV2",
        through="AffectedByPackageRelatedAdvisory",
    )

    fixing_advisories = models.ManyToManyField(
        to="AdvisoryV2",
        through="FixingPackageRelatedAdvisory",
        related_name="fixed_by_packages",  # Unique related_name
    )

    package_url = models.CharField(
        max_length=1000,
        null=False,
        help_text="The Package URL for this package.",
        db_index=True,
    )

With this approach, the affected packages old JSON field would be replaced by these relationships:

    affected_packages = models.JSONField(
        blank=True, default=list, help_text="A list of serializable AffectedPackage objects"
    )

pombredanne avatar Apr 25 '25 09:04 pombredanne

There is another thing: we cannot migrate or maintain something clean with old pipelines.

Before a pipeline would: 1. insert advisories, 2. from these inserted data, import/create vulnerabilities and relationships.

But now, this will be only a single step: 1. import/create advisories, packages and relationships.

Therefore IMHO:

  1. we do not need to have intermediate dataclass data structures.
  2. we should do at once the advisory creation, and its relationship including packages

And also: old importers cannot be reused as is, we need a v2 for each. We should add new pipelines for each importer in existing scripts and try to reuse code as much as possible between old and v2.

In all cases we will only switch to v2 when we have migrated importers and added API and UI, so this is all internal for now.

pombredanne avatar Apr 25 '25 10:04 pombredanne

Here is the schedule and list of importers/pipelines we are planning to migrate:

  • Batch 1 https://github.com/aboutcode-org/vulnerablecode/issues/1877
  • Batch 2 https://github.com/aboutcode-org/vulnerablecode/issues/1878
  • Batch 3 https://github.com/aboutcode-org/vulnerablecode/issues/1879
  • Batch 4 https://github.com/aboutcode-org/vulnerablecode/issues/1880
  • Batch 5 https://github.com/aboutcode-org/vulnerablecode/issues/1881

TG1999 avatar May 21 '25 08:05 TG1999

This is done now!

PRs for references: https://github.com/aboutcode-org/vulnerablecode/pull/1866

To test this: Set up Vulnerablecode on your system locally and run a new V2 importer pipeline for example:

./manage.py import gitlab_importer_v2

Then do ./manage.py shell to test newly introduced models. AdvisoryV2, PackageV2

AdvisoryV2.objects.all() PackageV2.objects.all()

TG1999 avatar Jul 11 '25 06:07 TG1999