Proposal to add "properties" array to CDX License object/definition
My team has been working on improving/updating legacy license scanning tools and integrating them with source code scanners that support SBOM generation (CDX format). Learning from this process we have found that there is additional information we would like to include relative to the "concluded" license. Specifically, we can provide a "confidence score" relative to the "regex" used as well as information about the tool(s) (name, version, reference, github) with its packaged/compiled regex set (at that version level). In addition, we plan to publish canonical "templates" for various license variants we have found in real-world packages/software in addition to the ones published by the SPDX (license identifier) community which we could also hyperlink to.
However, with the introduction cdx v1.4 schema, most data definitions are marked "additionalProperties": false, which prevents us from adding additional fields to the License data structure (or other similar structures). I would propose, at a minimum, adding optional "properties" to the "license" object/definition as the approved means to add such information which allows greater clarity to viewers/auditors reviewing license data as well as allows them to improve/evaluate the license scanning tools used in the hopes of having better regex submitted (PRs) to improve these tools over time (thereby improving SBOM data overall).
"license": {
"type": "object",
"title": "License Object",
"oneOf": [
...
],
"additionalProperties": false,
"properties": {
"id": {
...
},
"name": {
...
},
"text": {
...
},
"url": {
...
}
}
}
The resultant schema would be something like (mirroring other uses of "properties"):
"license": {
"type": "object",
"title": "License Object",
"oneOf": [
...
],
"additionalProperties": false,
"properties": {
"id": {
...
},
"name": {
...
},
"text": {
...
},
"url": {
...
},
"properties": {
"type": "array",
"title": "Properties",
"description": "Provides the ability to document properties in a name-value store. Formal registration is OPTIONAL.",
"additionalItems": false,
"items": {"$ref": "#/definitions/property"}
},
}
}
exampled use from use cases mentioned:
"properties": [
{
"name": "confidenceScore",
"value": "0.98"
},
{
"name": "regex",
"value": "<regex here>"
},
{
"name": "spdxTemplate",
"value": "<url to spdx template>"
},
{
"name": "toolRef",
"value": "tool:github/[email protected]" // similar to a bom-ref (pURL)
},
...
],
I am open to considering these use cases for adding actual named fields or new objects; however, to "future proof" and allow for more information around a given license to be aded still points to the need to add a "properties" array.
In addition, I would look to expand the "tools" object to allow for a "ref" (local ID) to be added so it could be referenced by the license structure directly, as we may want to associate a specific tool used for finding/surfacing a license for a given component/library type/language, etc. I will seek to open another PR on this topic asap.