Performance Enhancement: Cache package data during enrichment
I have a very large SBOM I'm enriching with Parlay - it takes about 7 hours to run. This is likely due to rate limiting on the upstream data requests.
In looking at the SBOM, about 80% of the package URLs are duplicated.
Caching the data requests reduced the time to one hour and 15 minutes.
I implemented this for the SPDX enrichment:
https://github.com/goneall/parlay/blob/f8c5a3b409140a3740a74d34aef86288116dfc43/lib/ecosystems/enrich_spdx.go#L38
Caveats: I'm not a go programmer - most of the coding was done with Visual Studio copilot. I'm running some comparisons of the results and some of the license data looks different from the non-cached version. I don't know if this is an issue with the code or with the upstream data sources. The other fields do seem to be properly augmented.