Prevent event queue saturation when processing BOMs with large component quantities
Previously, a RepositoryMetaEvent was dispatched for every component in the uploaded BOM. For a BOM with 20k components, 20k events were dispatched.
This was done before the VulnerabilityAnalysisTask is dispatched. Due to the sheer number of RepositoryMetaEvents, vulnerability analysis would be drastically delayed. Even worse, event processing in the entire platform would essentially halt because DT would be busy with processing RepositoryMetaEvents.
To prevent such situations, a single RepositoryMetaEvent is now dispatched for the entire project for which the BOM was uploaded. This ensures that the event subsystem remains available for other tasks.
Addresses #1759
Only downside I can fathom right now is that repository meta analysis will take a little longer in total, due to the switch from parallelized to sequential processing. But I think that tradeoff is worth it.
As an alternative, what if DT only dispatched RepositoryMetaEvent for new components?
Taking the BOM with 20K components as an example, if the same 20K components are unchanged upon every BOM upload, these events will still be kicked off and DT will have to check its cache.
Instead, if DT could only fire RepositoryMetaEvent when a new component is added?
RepositoryMetaEvent is dispatched every 24 hours anyway. By doing this, we could dramatically reduce database calls.
Instead, if DT could only fire RepositoryMetaEvent when a new component is added?
That will certainly help for existing projects, but not for new ones. It'd also mean that there's no way to force a refresh of repository meta anymore.
We could optionally dispatch RepositoryMetaEvents with batches of components (say 10-250, not really sure what a good size would be).
Hmmm, I just noticed that we don't cache repository meta data at all right now. That alone will most likely yield a significant improvement. I'll get this sorted and also see how your suggestion of only considering "new" components performs.
Cancelling this for now as stated in https://github.com/DependencyTrack/dependency-track/issues/1759#issuecomment-1242768566. The missing caching for repository meta data has been logged in https://github.com/DependencyTrack/dependency-track/issues/1943 and is scheduled for 4.7.