osv.dev icon indicating copy to clipboard operation
osv.dev copied to clipboard

[datasource] Docker Hardened Images

Open cdupuis opened this issue 1 month ago • 2 comments

  • [x] Prepare your data - refer to the OSV Schema documentation for information on how to properly format the data so it can be accepted.

Our OSV data is at https://github.com/docker-hardened-images/advisories.

https://github.com/ossf/osv-schema/pull/455

  • [x] Prepare and publish your records via a Git repository (example). If this method isn’t ideal, we also support publishing records from REST API endpoints or through a GCS bucket(example).

  • [x] To support API querying, please create a PR to extend purl_helpers.py and create a new ecosystem in _ecosystems.py. You can refer to existing examples showing how to implement support for Semver and non-Semver ecosystems.

https://github.com/google/osv.dev/pull/4388

cdupuis avatar Nov 22 '25 16:11 cdupuis

Hey @cdupuis, excited to have Docker Hardened Images onboard!

Going through the published advisories, I've noticed a few issues, that it'd be great to have addressed before ingestion.

  1. "ecosystem": "DHI", should match the value in the schema which you've set as "Docker Hardened Images".
  2. ids of the records should have the database prefix chosen in the schema PR: in this case, DHI. It would be great to change the prefix of the files themselves as well, but not necessary.
  3. Use the latest version of the schema to use the upstream field, and move the CVE-id into there instead. GHSA should also be moved from alias to upstream but not required. Happy to explain this further, but hopefully this blog post introducing the field should also help :)

On a per-record basis, there's some records with some mild issues. We have a linting tool that should help identify most of these issues, found in the osv-schema repository here. We've recently added a new flag --new-ecosystem that should prevent the influx of ecosystem related noise before the merge of the ecosystem into the schema.

There's a couple issues the linter can't currently pick up, that I've noticed:

  • Some records like CVE-2021-31957 have package information but no affected versions, meaning nothing will be matchable. Technically, this is valid in the schema but not recommended.
  • CVE-2022-31008 has multiple affected ranges that overlap, affecting the same package. I'd recommend moving these into the same affected struct, but have multiple ranges. If these version ranges are on different branches, but do have the same introduced value, then they can be grouped in the same events struct like:
 "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "3.10.2"
            },
            {
              "fixed": "3.9.18"
            },
            {
              "fixed": "3.8.32"
            }
]

but I have a suspicion based on the CVE5 record that these ranges shouldn't actually all be starting at 0.

There's likely a couple other issues, and I'm hoping to write some more linter checks to hopefully catch them, but for now, if we can address the above we will be well on our way to ingest your data. Thanks!

jess-lowe avatar Nov 26 '25 00:11 jess-lowe

Hey @jess-lowe, thanks for the detailed feedback. I'll go through this and update our processing pipeline based on your findings. I'll ping this PR again once we made updates.

cdupuis avatar Nov 26 '25 07:11 cdupuis