Introduce "SBOM" as a new data structure in DejaCode
Problem The SBOM community has identified multiple types of SBOM: Software Bill of Materials (SBOM) -- the default of course Software-as-a-Service Bill of Materials (SaaSBOM) Hardware Bill of Materials (HBOM) Machine Learning Bill of Materials (ML-BOM) Cryptography Bill of Materials (CBOM) Manufacturing Bill of Materials (MBOM) Operations Bill of Materials (OBOM)
This list is ever-expanding; other variants include: As-built SBOM As-deployed SBOM
These various types are implemented variously by different SBOM standards (CycloneDX and SPDX).
The DejaCode Product definition is quite flexible, and Packages and Components can be defined to identify the kinds of things that exist in each SBOM type, but there is no standard way to identify the SBOM Type associated with a specific DejaCode Product, further complicated by the potential need to extract particular SBOMs of various types from the same Product Inventory.
Benefits A new SBOM Type would address that need, and would support additional functionality related to various SBOM Types. Eventually this would support the generation of SBOMs from DejaCode that are more specific than the generic SBOM.
Design Challenges
Define SBOM Types in a new user-editable table in DejaCode?
OR
Define SBOM Types in a standard list in the source code?
OR
Do not validate SBOM Type and let it be free-form text?
Some advantages of an SBOM Type table would be:
- ability to describe exactly what the SBOM Type means to the organization
- ability to associate the SBOM Type with an SBOM Template (future ability -- this might align, for example, with SPDX SBOM "Profiles")
- ability to associate the SBOM Type with a specific SBOM Generator (program, DejaCode Report, etc. -- also future).
Assumptions A DejaCode Product definition can be associated with exactly one SBOM Type in a meaningful way.
There is an another useful perspective on SBOM Types at: https://www.linkedin.com/pulse/types-uses-sboms-dirk-riehle-3jcpe/
-
SBOMs created from the supplier’s development process
- Design. A Design SBOM is created from planning documents like prospective product architectures. As a consequence, a Design SBOM may not be an accurate reflection of what will be shipped eventually. It may be helpful to buyers in a supply chain to prepare for what’s to come their way.
- Source. A Source SBOM provides a static picture of the supplier’s source code and its dependencies, as found in the repositories. It can be helpful to identify vulnerabilities, but does not provide a complete picture as it omits any build or runtime dependencies.
- Build. A Build SBOM is created from the build process of the supplier as it compiles source code and assembles the final package for delivery to customers. Aimed at operations, it does not include components needed for building and testing. It may still miss dynamic dependencies though.
-
SBOMs created by the buyer (or others) through analysis
- Analyzed. An Analyzed SBOM is created from software composition analysis of the static delivered software. This is almost always a binary analysis of the artifact. As such, an Analyzed SBOM will miss much, but it may discover components that the suppliers may have overlooked.
- Deployed. A Deployed SBOM is created from analyzing the deployed software. After deployment, additional components may have been loaded or may have become visible that were not identifiable before. Like Analyzed, Deployed SBOMs complement the supplier’s SBOMs.
- Runtime. A Runtime SBOM is created from observing the running software (often requiring instrumentation). Of the SBOMs created through analysis, a Runtime SBOM provides the most comprehensive picture, but it will miss components that have not been activated and are not visible yet.
There's more. SBOMs conforming to SPDX profiles such as:
- Security
- Licensing
- AI
- Data
- Build
- Lite
- Core
- Software
Initial sketch of the SBOM Type table:
SBOM Type Short description: The SBOM Type, identified by a label, identifies the scope of a Software Bill of Materials.
Long description: Define the SBOM Types that support your software design, development and deployment life cycles.
Field Description label: Short name of the SBOM Type. text: Descriptive, explanatory text about the SBOM Type. default_on_addition: Indicates this instance is automatically assigned by the application to the SBOM when it is initially created. specification: A URL to the technical documentation of the SBOM Type. generator_api: An API that accepts the SBOM identifier to generate the SBOM document.
Here is a link to the CISA definitions for SBOM Types: https://www.cisa.gov/sites/default/files/2023-04/sbom-types-document-508c.pdf
It defines these SBOM Types, which are generally associated with particular phases in the lifecycle of a Product, and provide the basis for the SBOM Types described by Dirk Riehle (see previous comment):
Design Source Build Analyzed Deployed Runtime
I am not sure how an SBOM type would be for a product. A product exists outside of SBOM definitions entirely. Some of its packages and components may have a defined purpose and may be deployed or not. This may be what feeds into these types?
So, if we need to support these SBOM "types", we would need first to define how an inventory can be sliced and categorized.
And before we do this, I would like to see actual real world use case and value from adding this extra layer of complexity, beyond the categorization we already have.
You are correct that multiple SBOM Types can apply to a Product so we probably need some new data structure that is related to, but separate from Product to handle SBOMs and their Types. See also https://github.com/aboutcode-org/scancode-toolkit/issues/3915.
We have explored the idea of Item Lists in the past; perhaps an SBOM is essentially the same thing, with the added feature that we can assign an SBOM Type to it, which would describe its scope and purpose. So the new SBOM Type would apply directly to an SBOM, not a Product.
We could expand the Product Inventory capabilities by introducing the ability to attach one or more SBOMs to a Product, also a capability that we have discussed previously. This might be the best way to think about supporting the diversity of SBOM Types in DejaCode.
Refer to a previous DejaCode issue somewhat related to this discussion: #87
Research needed to identify the canonical way to specify an SBOM Type in a generated SBOM for both CycloneDX and SPDX SBOMs.
Refer to https://docs.google.com/document/d/1_K0qX_IKrfYuezPUp_fVDQIp_tGQ2sEk/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true for in-progress design details.
@DennisClark I get a 401 error on the Google Doc.
I identified the desire to store multiple SBOMs as well: SBOM from version control, SBOM when building code, SBOM when building container, multiple SBOMs from tools that scan the infrastructure (custom software and ready made software).
So not about AI and SaaS, but about different steps in deployment process.
The need stems from a situation of extensive infrastructure with software from different origins. So insights are gathered from different viewpoints, all of which are needed to get a complete insight. It would be different if ALL software would pass through a unified scanning process before deployment.
My ideas about modelling this:
- Stick to common definitions of SBOM type (source, built, ...)
- Use scantool info to make them unique (like 2 different tools scanning container images, so both producing same type of SBOM)
- Have a preferred SBOM type and tool as default view
- Merge component list to a encompassing list of components and set component properties of SBOM tool and SBOM type. These can be used for filtering components.
Alternative is to have duplicate registrations of products: 'mysoftware-source', 'mysoftware-built', 'mysoftware-container', etc.
I'm not a DejaCode user (yet) but this feature is on my want-to-have, which is why I share my perspective.
@nicorikken the access error is resolved ... try again https://docs.google.com/document/d/1_K0qX_IKrfYuezPUp_fVDQIp_tGQ2sEk/edit
@nicorikken wrt to your requirements of storing many SBOMs "types" and created by different tools, this is an intriguing idea! I guess the core issue is that tools return results that are so pooooooorly aligned and different that I question the value of that, but on the other I get why you would want this.
PS: we drafted a report comparing many popular and poor SCA tools outputs .... @adaaaam can you share a copy with @nicorikken ?
SBOM Types will likely mature at some point sooner than later so we should support it sooner than later in the AboutCode data model to prepare for later implementation.