vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

API performance issues (packages endpoints)

Open tdruez opened this issue 1 year ago • 2 comments

Let's focus on the Package-related endpoints for now as those are the ones used to collect vulnerability data in DejaCode.

Those tests were run on a clean install of VCIO with only the nginx.NginxImporter data set.

Package.objects.all().count()  # 88
Vulnerability.objects.count()  # 39

It's a very small amount of data but somehow looking at a single Package triggers over a thousand queries.

  • Package list /api/packages/ (on only 88 packages) -> 6,124 queries: 5706 similar queries. Duplicated 73 times.
  • Package details /api/packages/63 -> 1,329 queries: 1230 similar queries. Duplicated 16 times.
  • Bulk search /api/packages/bulk_search (providing the 88 purl): 39,925 queries.

This is quite problematic in the context of batch data collection using the VCIO API. The PackageSerializer and related QuerySets require optimization. Once done, make sure to implement unit test using the assertNumQueries to make sure that future code change do not add uncontrolled queries back.

Related issues:

  • https://github.com/aboutcode-org/vulnerablecode/issues/1492
  • https://github.com/aboutcode-org/dejacode/issues/94#issuecomment-2298445423
  • https://github.com/aboutcode-org/vulnerablecode/issues/1549

For bulk lookup, we track this here:

  • [ ] #1561

tdruez avatar Aug 06 '24 09:08 tdruez

First batch of improvements went in with this PR:

  • https://github.com/aboutcode-org/vulnerablecode/pull/1547

More to come

pombredanne avatar Aug 20 '24 12:08 pombredanne

The performance is now acceptable. We need to9 make sure we save a baseline of performances.

pombredanne avatar Oct 15 '24 12:10 pombredanne

FYI, the key code to get this fixed is the new API design in #1572

pombredanne avatar Oct 31 '24 15:10 pombredanne

@tdruez @TG1999 is this completed and closable? If yes, please do!

pombredanne avatar Dec 23 '24 22:12 pombredanne

@TG1999 I've run some data collection using the previous and the new implementation (api v1).

Fetching for 133485 Packages, using bulk_search, with 100 PURL entries per request, so about 1,334 HTTP requests total

  • Previous: Completed in 39.5 minutes
  • New: Completed in 21.0 minutes

Seems like we double the performance on the package endpoint.

tdruez avatar Dec 24 '24 15:12 tdruez

This is done now!

PRs for references: https://github.com/aboutcode-org/vulnerablecode/pull/1701 https://github.com/aboutcode-org/vulnerablecode/pull/1631 https://github.com/aboutcode-org/vulnerablecode/pull/1558

To test this

We have a new endpoint deployed and live on https://public.vulnerablecode.io/api/v2/

  • /api/v2/packages - https://public.vulnerablecode.io/api/v2/packages

Packages endpoint- This endpoint has three filters

  • affected_by_vulnerability - where we pass a VCID and get only the packages that are affected
  • fixing_vulnerability - where we pass a VCID and get only the packages that are fixing this VCID
  • purl - where we pass purl and get information about that PURL

Format:

https://public.vulnerablecode.io/api/v2/packages

If you see the above example:

  • In results, we now provide "vulnerabilities" and "packages", where "vulnerabilities" contains data on all vulnerabilities that might be fixed by those packages or affecting those packages.
  • For a single package format looks like this:
{
               "purl": "pkg:cargo/[email protected]",
               "affected_by_vulnerabilities": [
                   "VCID-vyuz-5w1k-mqce"
               ],
               "fixing_vulnerabilities": [],
               "next_non_vulnerable_version": "1.12.2",
               "latest_non_vulnerable_version": "1.12.2",
               "risk_score": 4.0
}

Additionally we have significantly reduced number of queries to 60% from https://github.com/aboutcode-org/vulnerablecode/commit/7fa45cb0d9dc802a6057edfb003a9f85cfed95fb#diff-d3ca0948dc3b5eb0b1adecaa9da9d7854628b0b6bbcf5f515bed6cab4d894339R474 to https://github.com/aboutcode-org/vulnerablecode/commit/9702c60bb4bac2b98dd988a47948408a16b2cff3#diff-d3ca0948dc3b5eb0b1adecaa9da9d7854628b0b6bbcf5f515bed6cab4d894339R472

Also added indexes for models

TG1999 avatar Mar 21 '25 11:03 TG1999