Embedding SPDX into binaries
Embedding package information into binaries can enable SCA tools and scanners to detect dependencies and check them for vulnerabilities, without needing a separate mechanism to transfer an SBOM.
- golang binaries embed dependency information in them by default, and scanners like trivy and syft can detect those dependencies.
- https://github.com/ossf/wg-vulnerability-disclosures/issues/76 seeks a solution to enable scanners to detect software in self-compiled (i.e non-distro packaged) software, as is commonly the case in container images
- A prototype for embedding Rust dependency information into Rust binares: https://github.com/Shnatsel/rust-audit/. This currently embeds a Rust specific, compressed JSON section into binaries. The Rust Secure Code Working Group is exploring whether an existing, language-agnostic format could be used instead.
SPDX, or SPDX Lite, documents could seemingly be embedded into a binary by a producer and detected by scanning tools. Some possible drawbacks:
- SPDX documents must contain the date of creation of the document. For this to co-exist with reproducible builds creators of tools to embed SPDX into binaries would need to consider options at reproducible-builds.org/de/docs/timestamps/
- The SPDX document couldn't meaningfully contain a checksum of the binary itself
- Increase in binary size. rust-audit compresses dependency info using zlib, which SPDX could also allow
Are there any reasons that would make SPDX/SPDX Lite an unsuitable format for this use case?
Hi @tofay, good thoughts and good questions. I think this is interesting, and tend to agree that items 1 and particularly 2 from your list are likely to be the major drawbacks to an approach of embedding the document directly in the binary itself.
This may not be directly on point, but two prior discussions that might be of interest to you is at https://github.com/spdx/spdx-spec/issues/439 and https://github.com/spdx/spdx-spec/issues/502. These were about the idea of having a sort of proto-manifest (and in the case of #502 at least, something lighter than SPDX-Lite) in a project or code repo, which could then in theory be auto-generated into a full SPDX document for a recipient of the code.
I don't think this directly answers the question you're thinking about, but the discussions in those threads might be relevant as you're thinking about this (even though I believe both of those were idea threads that haven't yet been agreed-upon or fully baked).
Those linked issues are interesting, thanks. #439 overlaps in particular points desire to standardize the attachment of an SPDX document to a "package" (in that case a directory). I don't think this use case needs a new sub-format though.
Could the spdx-spec have a new appendix for attaching/embedding scenarios? That would enable scanning tools to look in specific locations.
Relatedly, I also saw that some IANA types are registered for SPDX which can be used to attach SBOMs to OCI artifacts.
### Sample appendix
SPDX Documents may be embedded into or attached to artifacts.
ELF files and Portable Executables: Embed SBOM document into a section named one of:
- `.spdx`
- `.spdx.yaml`
- `.spdx.json`
depending on the format of the document.
<discuss caveats like reproducibility/checksum of binary itself>
OCI artifacts: When SBOMs are attached as ORAS Artifacts to an OCI artifact, the following [artifact type](https://github.com/oras-project/artifacts-spec/blob/main/artifact-manifest.md#oras-artifact-manifest-properties)) should be used
- [application/spdx+json](https://www.iana.org/assignments/media-types/application/spdx+json) for SPDX documents in JSON format
- [text/spdx](https://www.iana.org/assignments/media-types/text/spdx) for SPDX documents in key value format
Some other concerns:
- PE section names can be maximum 8 characters in length
- compression data (native compression could be used for ELF)
Moving to SPDX 3.1 for consideration.