dependency-track icon indicating copy to clipboard operation
dependency-track copied to clipboard

Prevent event queue saturation when processing BOMs with large component quantities

Open nscuro opened this issue 3 years ago • 4 comments

Previously, a RepositoryMetaEvent was dispatched for every component in the uploaded BOM. For a BOM with 20k components, 20k events were dispatched.

This was done before the VulnerabilityAnalysisTask is dispatched. Due to the sheer number of RepositoryMetaEvents, vulnerability analysis would be drastically delayed. Even worse, event processing in the entire platform would essentially halt because DT would be busy with processing RepositoryMetaEvents.

To prevent such situations, a single RepositoryMetaEvent is now dispatched for the entire project for which the BOM was uploaded. This ensures that the event subsystem remains available for other tasks.

Addresses #1759

nscuro avatar Jul 08 '22 20:07 nscuro

Only downside I can fathom right now is that repository meta analysis will take a little longer in total, due to the switch from parallelized to sequential processing. But I think that tradeoff is worth it.

nscuro avatar Jul 08 '22 20:07 nscuro

As an alternative, what if DT only dispatched RepositoryMetaEvent for new components?

Taking the BOM with 20K components as an example, if the same 20K components are unchanged upon every BOM upload, these events will still be kicked off and DT will have to check its cache.

Instead, if DT could only fire RepositoryMetaEvent when a new component is added?

RepositoryMetaEvent is dispatched every 24 hours anyway. By doing this, we could dramatically reduce database calls.

stevespringett avatar Jul 08 '22 20:07 stevespringett

Instead, if DT could only fire RepositoryMetaEvent when a new component is added?

That will certainly help for existing projects, but not for new ones. It'd also mean that there's no way to force a refresh of repository meta anymore.

We could optionally dispatch RepositoryMetaEvents with batches of components (say 10-250, not really sure what a good size would be).

nscuro avatar Jul 10 '22 18:07 nscuro

Hmmm, I just noticed that we don't cache repository meta data at all right now. That alone will most likely yield a significant improvement. I'll get this sorted and also see how your suggestion of only considering "new" components performs.

nscuro avatar Aug 03 '22 20:08 nscuro

Cancelling this for now as stated in https://github.com/DependencyTrack/dependency-track/issues/1759#issuecomment-1242768566. The missing caching for repository meta data has been logged in https://github.com/DependencyTrack/dependency-track/issues/1943 and is scheduled for 4.7.

nscuro avatar Sep 10 '22 17:09 nscuro