Define terminology
We need to agree on terminology
- Avoid key terms used in SBOMs to avoid confusion
- Base on existing terminology
- Proposal https://niccs.cisa.gov/cybersecurity-career-resources/vocabulary or this: https://csrc.nist.gov/glossary
Possibly create a doc in the repo for TEA-specific terms like
- TEA Index
- TEA Bundle
- TEA collection
- TEApot :-)
Compiled a list of terms we are (or may be) using and those with some definitions in the referenced documents (not necessarily fitting definitions, just for faster search).
Blank entries don't have definitions in these 2 glossary documents.
Index
Bundle SPDX - Bundle A collection of Elements that have a shared context.
Collection https://csrc.nist.gov/glossary/term/collection The first phase of the computer and network forensics process, which involves identifying, labeling, recording, and acquiring data from the possible sources of relevant data, while following guidelines and procedures that preserve the integrity of the data.
Component https://csrc.nist.gov/glossary/term/component
- An element of a large system—such as an identity card, issuer, card reader, or identity verification support—within the PIV system.
- A discrete identifiable information technology asset that represents a building block of a system and may include hardware, software, and firmware.
- A software object, meant to interact with other components, encapsulating certain functionality or a set of functionalities. A component has a clearly defined interface and conforms to a prescribed behavior common to all components within an architecture.
- A discrete identifiable IT asset that represents a building block of an information system.
- Any hardware, software, and/or firmware required to construct a CKMS.
- An element such as a fingerprint capture station or card reader used by an issuer, for which [FIPS 201-2] has defined specific requirements.
- Discrete identifiable information technology assets that represent a building block of a system and include hardware, software, firmware, and virtual machines.
- A discrete identifiable information or operational technology asset that represents a building block of a system and may include hardware, software, and firmware.
- An entity with discrete structure, such as an assembly or software module, within a system considered at a particular level of analysis. Component refers to a part of a whole, such as a component of a software product, a component of a software identification tag, etc.
- A hardware, software, or firmware part or element of a larger system with well-defined inputs and outputs and a specific function.
- A hardware, software, firmware part or element of a larger PNT system with well-defined inputs and outputs and a specific function.
Project https://csrc.nist.gov/glossary/term/project Endeavor with defined start and finish criteria undertaken to create a product or service in accordance with specified resources and requirements.
Product https://csrc.nist.gov/glossary/term/product
- Result of a process.
- Part of the equipment (hardware, software and materials) for which usability is to be specified or evaluated.
- A complete set of computer programs, procedures and associated documentation and data designed for delivery to a software consumer.
- A software application that has one or more capabilities.
Branch
Feature Set
Release https://csrc.nist.gov/glossary/term/release A collection of new and/or changed configuration items which are tested and introduced into a production environment together.
Artifact https://csrc.nist.gov/glossary/term/artifact
- A piece of evidence
- Work products that are produced and used during a project to capture and convey information (e.g., models, source code).
- A piece of evidence, such as text or a reference to a resource, that is submitted to support a response to a question.
Attachment
2024-08-03: Edit - Added component, bundle from SPDX
I think we need to define first what the atomic unit (product?) in the index is.
For a hardware producer this might be a physical device, e.g. a WiFi router. These however usually have multiple detachable parts, e.g. a charger. If the charger is also sold separately, should the WiFi route be a bundle and not a product?
For a software company, I would say that a product is each element that can be downloaded separately, even if it is not sold separately. For example Apache Tomcat is a product with a binary distribution that you can download and run. However each one of the Apache Tomcat libraries is available separately and users often use some of these libraries to embed Tomcat in their project.
I think the product is something the manufacturer hands over to the user. The same product can have many names and a "product" can be a bundle of many bundles of products.
In my view the API, the "product" is what is distributed, is the entry point in the index. The creator of the TEI defines it. One entry point can have multiple TEIs pointing to it.
In the other end we have the TEI collection that is tied to a single product/unit with a version and artefacts for it.
CycloneDX defines the thing that a BOM describes as "BOM metadata includes the supplier, manufacturer, and target component for which the BOM describes" - everything is a "component".
I don't think I make things clearer... It's early morning :-)
Took a stab at combining Olle's original proposal with what I had in the API based on our research at Reliza and definitions above. Would appreciate feedback.
Component is a discrete identifiable information technology asset that represents a building block of a system and may include hardware, software, and firmware. Component has Branches that have Releases. A Release consists of one or more Deliverables. For example, for a software Release, a set of Deliverables may be: Debian package, MSI distribution, Container image. A Deliverable has Artifacts that can be a BOM or an Attachment, such as VEX, VDR, Attestation.
Bundle is a marketable representation of an identifiable information technology product. Bundle has Feature Sets that have Collections. A Collection consists of one or more Deliverables. A Collection is what the end-user buys. For example, for hardware, that would be a physical package that the client receives; for software, that would be a complete distribution that the client receives.
TEA Index represents a bundle - showing all Collections that are part of this Bundle.
I think for the TEA spec, the part related to Component may actually be omitted or made optional, but it's nice to keep this in mind in any case.
Below is the flow chart representation of the same thing I wrote in text (you can click to enlarge):
Alternative chart version that fits some additional use-cases:
Here a Variant is a physical package that the client receives. And a Collection is a set of Variants that are marketable together. Version would be assigned at a Variant level but Collection is assumed to have some shared Version prefix or same Marketing Version for all Variants inside it.
I.e., a Collection could mean a database software distribution marketed with Version 1.0.1. Then it may have several platform specific Variants: Windows, Linux32, Linux64, Mac. Variants may also have additional dimensions represented as a flattened set, i.e. North American Windows Variant; EMEA Linux64 Variant, etc.
Updated diagram below (click to enlarge):
Edited on 2024-08-05: Changed the term Distribution to Variant
I don't see any reason to separate BOMs from other files in phase one. Did you have any specific reason for doing so?
That came up in conversation with Steve on a Koala's call when you were away. The reason is that certain fields differ between BOMs and attachments. It's possible to unify those but then we'll have API fields conditional on type. In any case I don't think it's very important to solve at this stage - we can defer to when we are more focused on the API itself.
Stuff that happens when you're on holiday. ha ha :-)
I still think we need one attachment object - could be that we have additional attributes for BOMs, like the BOM identifier.
Checked https://github.com/CycloneDX/transparency-exchange-api/pull/22/files and I already had an optional BOM identifier in the definition of an artefact.
I'll need to see the use cases to understand how the above terminology applies and how we would translate those terms to SPDX terminology.
The definition of Component requiring some form of branch / release may (or may not) be overly constraining depending on how it is used.
In the SPDX discussions, we found it extremely useful to separate the concept of an Artifact from the concept of a Package. The latter is a form of Artifact which is "released or made available through distribution. Packages require metadata like release versions, download locations and/or project home pages whereas Artifacts in general may not. If TEA components are to be thought of like packages, having branch / release data may be fine. If the components are to represent the more general Artifacts which may be embedded in a release but not released independently, it may be overly constraining.
@goneall thank you for your feedback. There is a use case document in the repository if you are interested. I also have a pull request adding to it.
This is the output from our last meeting and the brainstorm meeting