Investigate pre-exiting formats for storing dependency info
Apparently there is a number of formats designed to encode package info already: https://gitbom.dev/glossary/sbom/
We need to check if any of them are suitable for our use case. Notably we redact some field such as git repo URLs, and also include information about enabled features, so it might not be 100% compatible.
Also, the degree of adoption of these formats needs to be understood; perhaps we should provide conversion utilities, even if we don't end up using the format internally.
Specifically, we need to understand:
- Does anyone actually use those SBOM formats?
- Are any of those formats a good fit for storing our data - perhaps we won't have to invent a custom format after all?
Dumping my notes on formats and SPDX here.
Suggested requirements for data format.
- Needs to be able to convey Rust crate runtime and build dependencies.
- Needs to be extensible to adding extra information we may want to add in the future, e.g statically linked C libraries, or build tool versions such as rustc?
- Needs to be easily interoperable with other tools. Parsable in Rust and other languages (in particular go as used by syft/trivy SCA tools). Needs to be easy for tools to correlate with vulnerability dbs (e.g Rustsec)
Trivy creator asked Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com) Discussion loosely points to SBOM formats being more appropriate as a data format than package identification formats (SWID/PURL). In particular SBOM formats allow expressing the nature of relationships (e.g build/runtime dependency).
It was suggested on zulip that SPDX is likeliest SBOM format to reach wider adoption given it's backing by OpenSSF and industry.
There's currently no standardized way to embed SPDX SBOMs into binaries - Embedding SPDX into binaries · Issue #739 · spdx/spdx-spec (github.com).
Some concerns over embedding SPDX SBOMs are:
- Size, as SBOMs can be very large with e.g license information. It's not clear that’s required for the vulnerability use case, as for SPDX SBOMs NOASSERTION could be used as the value for various license fields (or the SPDX identifier instead of full license text). The SBOM could be compressed prior to embedding (ELF supports native compression too of sections too, unsure about PE/Mach-O)
- Impact on reproducibility. SPDX format includes creation timestamps. If the binary is represented in SPDX SBOM as a File then it'd need to have a SHA1 checksum, which wouldn't be accurate. This could be mitigated by representing the binary as a (Root?) Package of the SBOM, and not including file information for the binary itself.
An example representing a binary as a SPDX File looks like
{
"spdxVersion": "SPDX-2.2",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "baz.spdx.json",
"documentNamespace": "https://foo.bar/",
"creationInfo": {
"created": "2022-08-01T18:44:38Z",
"creators": [
"Tool: cargo-spdx 0.1.0"
]
},
"packages": [
{
"copyrightText": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "PACKAGE_MANAGER",
"referenceLocator": "pkg:cargo/[email protected]",
"referenceType": "purl"
}
],
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "NOASSERTION",
"name": "bar",
"SPDXID": "SPDXRef-bar-0.1.0",
"versionInfo": "0.1.0"
},
{
"copyrightText": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "PACKAGE_MANAGER",
"referenceLocator": "pkg:cargo/[email protected]",
"referenceType": "purl"
}
],
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "NOASSERTION",
"name": "baz",
"SPDXID": "SPDXRef-baz-0.1.0",
"versionInfo": "0.1.0"
},
{
"copyrightText": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "PACKAGE_MANAGER",
"referenceLocator": "pkg:cargo/[email protected]",
"referenceType": "purl"
}
],
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "NOASSERTION",
"name": "foo",
"SPDXID": "SPDXRef-foo-0.1.0",
"versionInfo": "0.1.0"
}
],
"files": [
{
"checksums": [
{
"algorithm": "SHA1",
"checksumValue": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
}
],
"copyrightText": "NOASSERTION",
"fileName": "baz",
"fileTypes": [
"BINARY"
],
"licenseConcluded": "NOASSERTION",
"SPDXID": "SPDXRef-File-baz"
}
],
"relationships": [
{
"relatedSpdxElement": "SPDXRef-baz-0.1.0",
"relationshipType": "GENERATED_FROM",
"spdxElementId": "SPDXRef-File-baz"
},
{
"relatedSpdxElement": "SPDXRef-bar-0.1.0",
"relationshipType": "DEPENDS_ON",
"spdxElementId": "SPDXRef-File-baz"
},
{
"relatedSpdxElement": "SPDXRef-baz-0.1.0",
"relationshipType": "DEPENDS_ON",
"spdxElementId": "SPDXRef-File-baz"
},
{
"relatedSpdxElement": "SPDXRef-foo-0.1.0",
"relationshipType": "DEPENDS_ON",
"spdxElementId": "SPDXRef-File-baz"
}
]
}
Rust support for SPDX SBOM format:
-
doubleopen-project/spdx-rs: SPDX Documents in Rust (github.com) exists for serializing/deserializing
-
cargo-spdx has some serialization support for SPDX. We'd want to share SPDX support with that
-
(There's a JSON schema for SPDX, so should be straightforward to generate serde representations should existing ones not be suitable)
More questions to consider regarding use of SPDX in cargo-auditable:
Does it actually make it easier to use the embedded data?
- Considering both Rust tooling (cargo audit) and external tools (go-rustaudit/syft)
Is it worth using a different format at all without a resolution to Embed CPE names into binaries · Issue #76 · ossf/wg-vulnerability-disclosures (github.com)
- Win from that would be interoperability. If we used SPDX format but in a non-standardized section header then we'd still have to teach SCA tools to look in that location.
- The existing format is conveys similar information to Cargo.lock. An advantage to this is that SCA tools are generally capable of reading Cargo.lock files, so the existing format is likely to be easy to integrate with SCA tools existing Rust support (and this was the case when integrating with syft). Unclear whether that would apply to non-crate information (e.g rustc version/statically linked C libraries)
Re "does anyone actually use these format", both trivy and grype (the vulnerability scanning tool that works with/uses syft) are capable of reading SBOMs in multiple formats, e.g SPDX/cyclonedx.
If there was a standardized section name for embedding SBOMs then cargo-auditable could use that and these tools could be updated to detect that. And without section name standardization, cargo-auditable could use SPDX, and go-rustaudit could extract the SBOM and expose the JSON for these tools to parse with their existing parsers.
Hi, I just heard from you on the Rustacean Station podcast - really cool stuff here! :-)
I've been thinking, talking and exchanging about this whole topic here for a while now, so let me add some references:
- AMI is planning to roll out an entire ecosystem for firmware SBoM, likely based on CycloneDX
- CycloneDX is discussing evidence inclusion for the upcoming spec release
- in another firmware area, uswid is being suggested and prototyped with, though lacking proves/references; it's rather decoupled from the actual binaries it describes
- there is an org named Veraison dealing with verification and attestation at large, mostly rooted in TCG DICE, including SWID, which is an ISO/NISTIR thing
- I have started a very slow project in my spare time trying to aggregate everything related to platforms and systems, including the idea of an Auditable Firmware Implementation
- I am also the author of Fiedka the Firmware Editor, for which I am currently drafting SBoM and annotation features, on which I presented at OSFC this year, and with some other perspectives just last Saturday at a local event
When I asked who else would be interested in the topic, I was invited to the CycloneDX Slack, where people discuss the entire SBoM topic very broadly. Maybe that's also for you. :-)
Finally, I am quite involved in the oreboot firmware project, where I'm seeking to introduce SBoM as well, likely based on CycloneDX, for which there is also a Rust implementation.
That shall be it for now; feel free to poke back at me should you have any further questions etc.. :partying_face:
Thanks for the links! Having SBOMs in firmware would certainly be cool!
So far I've found everything not specifically designed for inclusion into binaries unsuitable, for two reasons:
- Inclusion of dates messes up reproducible builds
- The formats are very verbose and/or require including lots of information that is not relevant for the purposes of a security audit, increasing the binary size considerably.
I'm looking to talk to some people who have worked on the SBOM embedded in Go binaries by default. They also rolled their own JSON-based format, and perhaps we could collaborate on something more generic or at least that could be shared between the two.
FWIW Syft can already convert from the cargo auditable data format to CycloneDX.
https://github.com/google/osv-scanner supports "SPDX and CycloneDX SBOMs using Package URLs" - https://google.github.io/osv-scanner/usage/#specify-sbom
As an alternative/pre-cursor for storing the dependency info in those SBOM formats, perhaps rust-audit-info could extract the existing format and do a "rough" conversion to these SBOM formats, so that integration with these other tools can be explored, determining what (if any) extra fields need to be stored in the rust binaries in order to get reasonable compatibility with these tools.
Syft can already perform such a conversion today.