Make VulnerableCode less fragile to schema changes in upstream projects
This is related to https://github.com/nexB/vulnerablecode/issues/266 and https://github.com/nexB/vulnerablecode/issues/244 .
Currently we use schema validators like https://github.com/nexB/vulnerablecode/blob/40fe93611703cd37eb18e72bf2c9c747e5da1863/vulnerabilities/importers/debian.py#L44 . The rationale I had behind using it was to have loud failures and prevent VulnerableCode to insert garbage data into db .
The next step would be to evolve this mechanism in addition to prevent inserting garbage data, we want it to be:
- Greedy wrt to collection of data. IE don't stop at first failure. It should rather navigate around failures.
- Have robust error logging and handling. Currently we seriously lack this.
- Have periodic import tests. For this periodic GitHub actions would be used. Each importer will have it's own action so as to compartmentalize individual importer failure. We could even have their status in our README.
Regarding GitHub actions, these need to be quickly finished. We can do this by only running the "gather" process in the actions. Don't run the "insert" process in the action.
Also note that, for the point 3 of the my above comment, to add more value we want our "insert" process to be bulletproof. Which it is NOT atm, but we can get there alright .
By "gather" do you mean that the database operations should be avoided ? IMO, we can do that by either
- Sub classing the django.test.Testcase or
- Mocking the process_advisories()
@Hritik14 the gather thing is done via https://github.com/nexB/vulnerablecode/pull/365 . No issues there ;)
And the insert process is also bulletproof now .
That's great. I would like to start on 3 then.