Investigate using the OVAL data for ubuntu provider
Today we parse the CVE information for ubuntu distributions from git://git.launchpad.net/ubuntu-cve-tracker . This is probably correct for unsupported distro versions, but for supported distro versions we should be leveraging the OVAL data https://security-metadata.canonical.com/oval/ . Searching through the git history for merging record changes is a slow process (hours with the current implementation), so if we could find ways to improve this section of the code or eliminate the need altogether that would be ideal.
More investigation is needed to understand:
- where the bottle necks with the current implementation are today
- can the hot spots be refactored to alleviate time and resource pains?
- what the OVAL data has or doesn't have over the current git cve-tracker repo
After further investigation it doesn't make sense to use the OVAL data provided by canonical since it is only available for "supported OSs" (https://wiki.ubuntu.com/Releases) which omits several distro versions we scan today. This means that we would still need to do all of the processing that we do today in order to support older distro versions anyway.
As for performance, I haven't been able to run this provider to conclusion after running for several hours (for some days). After digging it looks like the main performance bottleneck is the revision history search... here are some suggested improvements:
- implement concurrent workers (4 to start with). The
git logcall is out-of-process implying that the process is currently IO bound. Since these are read-only calls we don't need to worry about filesystem locks or internal git locks... this should scale nicely using simple threads for all_merge_cve()calls. - remove
--followfrom thegit logcommand. Given that git already tracks renames without--followwithin the same commit and the nature of the process is to move files within the same commit, this appears safe. This should be verified. - after cloning or fetching, I've added a
git commit-graph writeto optimize thegit logcalls (see https://git-scm.com/docs/git-commit-graph). This should drop the averagegit logcall from ~15 seconds to ~1 second.
Tried out all three suggestions, there are some improvements, however, @westonsteimel found several instances that ultimately required the --follow flag on the git log commands.