parlay icon indicating copy to clipboard operation
parlay copied to clipboard

Performance Enhancement: Cache package data during enrichment

Open goneall opened this issue 11 months ago • 0 comments

I have a very large SBOM I'm enriching with Parlay - it takes about 7 hours to run. This is likely due to rate limiting on the upstream data requests.

In looking at the SBOM, about 80% of the package URLs are duplicated.

Caching the data requests reduced the time to one hour and 15 minutes.

I implemented this for the SPDX enrichment:

https://github.com/goneall/parlay/blob/f8c5a3b409140a3740a74d34aef86288116dfc43/lib/ecosystems/enrich_spdx.go#L38

Caveats: I'm not a go programmer - most of the coding was done with Visual Studio copilot. I'm running some comparisons of the results and some of the license data looks different from the non-cached version. I don't know if this is an issue with the code or with the upstream data sources. The other fields do seem to be properly augmented.

goneall avatar Feb 15 '25 18:02 goneall