vulnerablecode icon indicating copy to clipboard operation
vulnerablecode copied to clipboard

Package bulk_search by PURLs is too slow

Open tdruez opened this issue 3 years ago • 4 comments

It takes about 30 seconds to run a single request on a bulk search of 74 purls, see purls.json.zip

Performances need to be improve to make this usable.


Also, I would like a new purl_only payload option. When provided in the POST data, along the list of purls, only return a list of the vulnerable PURLs, not the full serialization of the Package. The use case is that given a list of purls, I only need to know which one are vulnerable, without any extra details.

tdruez avatar Nov 25 '22 08:11 tdruez

I am thinking to have a field called package stored in class Package. This will help us to improve the speed of bulk_search by making a single query rather making queries over a loop. Regarding the concerns of manipulation of fields inside the table not affecting the package field: I don't think any anonymous user can change the package fields, and in case we need to change any field of purl we can handle the change of package during the migration

TG1999 avatar Nov 25 '22 14:11 TG1999

@TG1999 IMHO we need to do some measurements and have these measure "frozen" in tests so that we can improve this correctly.

You wrote:

I am thinking to have a field called package stored in class Package. This will help us to improve the speed of bulk_search by making a single query rather making queries over a loop.

I am not sure I get your design and why this would speed things up. Can you either elaborate or push a branch with a sketch of your ideas?

Now from a quick look, looking thing up in a loop one purl at a time is the likely culprit in https://github.com/nexB/vulnerablecode/blob/83b2bc6dab5fe15eb8a172956dbdfb385e6a19e0/vulnerabilities/api.py#L255

The likely cure is to create a single query rather n queries.

But that's juts a hunch: we need to profile/measure/log what's happening first.

pombredanne avatar Nov 25 '22 15:11 pombredanne

@pombredanne this is a very rough sketch of what I have proposed https://github.com/nexB/vulnerablecode/pull/1017 .

we need to profile/measure/log what's happening first.

Sure, will do some profiling for before and after having this change.

TG1999 avatar Nov 25 '22 15:11 TG1999

@TG1999 I get now what you mean by package .... but a purl is a purl or a package_url and not a package and that's why I could not get what your meant. Name this purl or package_url.

pombredanne avatar Nov 25 '22 15:11 pombredanne