vulnerablecode Make VulnerableCode less fragile to schema changes in upstream projects

This is related to https://github.com/nexB/vulnerablecode/issues/266 and https://github.com/nexB/vulnerablecode/issues/244 .

Currently we use schema validators like https://github.com/nexB/vulnerablecode/blob/40fe93611703cd37eb18e72bf2c9c747e5da1863/vulnerabilities/importers/debian.py#L44 . The rationale I had behind using it was to have loud failures and prevent VulnerableCode to insert garbage data into db .

The next step would be to evolve this mechanism in addition to prevent inserting garbage data, we want it to be:

Greedy wrt to collection of data. IE don't stop at first failure. It should rather navigate around failures.
Have robust error logging and handling. Currently we seriously lack this.
Have periodic import tests. For this periodic GitHub actions would be used. Each importer will have it's own action so as to compartmentalize individual importer failure. We could even have their status in our README.

Regarding GitHub actions, these need to be quickly finished. We can do this by only running the "gather" process in the actions. Don't run the "insert" process in the action.

Oct 30 '20 06:10 sbs2001

Also note that, for the point 3 of the my above comment, to add more value we want our "insert" process to be bulletproof. Which it is NOT atm, but we can get there alright .

Oct 30 '20 06:10 sbs2001

By "gather" do you mean that the database operations should be avoided ? IMO, we can do that by either

Sub classing the django.test.Testcase or
Mocking the process_advisories()

Apr 04 '21 13:04 Hritik14

@Hritik14 the gather thing is done via https://github.com/nexB/vulnerablecode/pull/365 . No issues there ;)

Apr 05 '21 04:04 sbs2001

And the insert process is also bulletproof now .

Apr 05 '21 05:04 sbs2001

That's great. I would like to start on 3 then.

Apr 05 '21 09:04 Hritik14