Package bulk_search by PURLs is too slow
It takes about 30 seconds to run a single request on a bulk search of 74 purls, see purls.json.zip
Performances need to be improve to make this usable.
Also, I would like a new purl_only payload option. When provided in the POST data, along the list of purls, only return a list of the vulnerable PURLs, not the full serialization of the Package.
The use case is that given a list of purls, I only need to know which one are vulnerable, without any extra details.
I am thinking to have a field called package stored in class Package. This will help us to improve the speed of bulk_search by making a single query rather making queries over a loop. Regarding the concerns of manipulation of fields inside the table not affecting the package field: I don't think any anonymous user can change the package fields, and in case we need to change any field of purl we can handle the change of package during the migration
@TG1999 IMHO we need to do some measurements and have these measure "frozen" in tests so that we can improve this correctly.
You wrote:
I am thinking to have a field called package stored in class Package. This will help us to improve the speed of bulk_search by making a single query rather making queries over a loop.
I am not sure I get your design and why this would speed things up. Can you either elaborate or push a branch with a sketch of your ideas?
Now from a quick look, looking thing up in a loop one purl at a time is the likely culprit in https://github.com/nexB/vulnerablecode/blob/83b2bc6dab5fe15eb8a172956dbdfb385e6a19e0/vulnerabilities/api.py#L255
The likely cure is to create a single query rather n queries.
But that's juts a hunch: we need to profile/measure/log what's happening first.
@pombredanne this is a very rough sketch of what I have proposed https://github.com/nexB/vulnerablecode/pull/1017 .
we need to profile/measure/log what's happening first.
Sure, will do some profiling for before and after having this change.
@TG1999
I get now what you mean by package .... but a purl is a purl or a package_url and not a package and that's why I could not get what your meant. Name this purl or package_url.