osv.dev icon indicating copy to clipboard operation
osv.dev copied to clipboard

Support mirroring through API

Open VinodAnandan opened this issue 3 years ago • 4 comments

The users may want to mirror the OSV batch data to improve the performance on both sides. If the OSV API can provide batch data similar to the one from GHSA API, it will help with the mirroring.

Related links :

https://github.com/DependencyTrack/dependency-track/blob/master/src/main/java/org/dependencytrack/tasks/GitHubAdvisoryMirrorTask.java

https://github.com/DependencyTrack/dependency-track/blob/master/src/main/resources/templates/github/securityAdvisories.peb

VinodAnandan avatar Jun 03 '22 20:06 VinodAnandan

hey @VinodAnandan !

Could you please clarify what you mean by mirroring batch data? Do you mean accessing a data dump of all aggregated OSV data?

There is already a way to do so here: https://github.com/google/osv#data-dumps

oliverchang avatar Jun 06 '22 04:06 oliverchang

Hey @oliverchang, The initial batch will contain all the data at that particular point in time. The subsequent process will fetch the new/modified data.

Can we periodically fetch incremental data from the osv data-dumps? Could you please share any documentation?

VinodAnandan avatar Jun 06 '22 10:06 VinodAnandan

Hey @oliverchang, The initial batch will contain all the data at that particular point in time. The subsequent process will fetch the new/modified data.

Can we periodically fetch incremental data from the osv data-dumps? Could you please share any documentation?

We don't have any functionality today to provide incremental data, mostly because we haven't seen a pressing need for this. The size of all vulnerability data over time should be small enough it's simpler to just bulk process all entries from scratch each time. This simplicity may significantly outweigh any potential efficiencies from a more complicated incremental setup.

Here are the current sizes across all of OSV:

> gsutil ls -lah 'gs://osv-vulnerabilities/**/all.zip'   
257.89 KiB  2022-06-07T06:57:05Z  gs://osv-vulnerabilities/Android/all.zip#1654585025273555  metageneration=1
 11.08 KiB  2022-06-07T06:57:07Z  gs://osv-vulnerabilities/DWF/all.zip#1654585027026057  metageneration=1
 19.62 KiB  2022-06-07T06:57:07Z  gs://osv-vulnerabilities/GSD/all.zip#1654585027622910  metageneration=1
701.51 KiB  2022-06-07T06:57:12Z  gs://osv-vulnerabilities/Go/all.zip#1654585032781187  metageneration=1
 11.36 KiB  2022-06-07T06:57:15Z  gs://osv-vulnerabilities/Hex/all.zip#1654585035336832  metageneration=1
     783 B  2022-06-07T06:57:15Z  gs://osv-vulnerabilities/JavaScript/all.zip#1654585035648760  metageneration=1
  9.18 MiB  2022-06-07T06:58:39Z  gs://osv-vulnerabilities/Linux/all.zip#1654585119381016  metageneration=1
  1.71 MiB  2022-06-07T06:59:05Z  gs://osv-vulnerabilities/Maven/all.zip#1654585145838779  metageneration=1
212.78 KiB  2022-06-07T06:59:11Z  gs://osv-vulnerabilities/NuGet/all.zip#1654585151540424  metageneration=1
  1.62 MiB  2022-06-07T06:59:25Z  gs://osv-vulnerabilities/OSS-Fuzz/all.zip#1654585165363352  metageneration=1
911.27 KiB  2022-06-07T06:59:39Z  gs://osv-vulnerabilities/Packagist/all.zip#1654585179108957  metageneration=1
  3.58 MiB  2022-06-07T07:00:05Z  gs://osv-vulnerabilities/PyPI/all.zip#1654585205754672  metageneration=1
560.23 KiB  2022-06-07T07:00:19Z  gs://osv-vulnerabilities/RubyGems/all.zip#1654585219361574  metageneration=1
      22 B  2022-06-07T07:00:21Z  gs://osv-vulnerabilities/UVI/all.zip#1654585220970207  metageneration=1
721.16 KiB  2022-06-07T07:00:25Z  gs://osv-vulnerabilities/crates.io/all.zip#1654585225815500  metageneration=1
  2.28 MiB  2022-06-07T07:00:41Z  gs://osv-vulnerabilities/npm/all.zip#1654585241801811  metageneration=1

Will this cause issues?

oliverchang avatar Jun 07 '22 07:06 oliverchang

Thanks @oliverchang, we will be using the full download as an interim solution. But I think the incremental update will enable small and more frequent downloads of the database.

VinodAnandan avatar Jun 10 '22 22:06 VinodAnandan