cyclonedx-cli icon indicating copy to clipboard operation
cyclonedx-cli copied to clipboard

bug: SMAIL-GPL causing validate to fail

Open spiffcs opened this issue 4 months ago • 3 comments

👋 Sorry if this is being handled in another thread. I tried to find all coverage of current licenses issues and didn't see this one.

Reproduction

You might need to modify the platform flag for your local on the below:

syft -o cyclonedx-json docker:nginx:latest | docker run -i --platform linux/amd64 cyclonedx/cyclonedx-cli:latest validate --input-format json

On instance: /components/12/licenses/2/license:
{"id":"SMAIL-GPL"}
Value should match one of the values specified by the enum
http://cyclonedx.org/schema/spdx.schema.json
On instance: /components/12/licenses/2/license/id:
SMAIL-GPL
Unable to validate against any JSON schemas.
BOM is not valid.

After removing the SMAIL-GPL License from the SBOM in component[12]:

cat new.json | docker run -i --platform linux/amd64 cyclonedx/cyclonedx-cli:latest validate --input-format json
BOM validated successfully.

I'm not sure which version of http://cyclonedx.org/schema/spdx.schema.json the validator is pulling from since I do see SMAIL-GPL included there. If this is fixed by doing a new release then no harm no foul I'll close the issue when the new release comes out 😄

Also. I did notice how this issue then caused a TON of noise above it. Here is the full output of the error:

On instance: /components/12/licenses:
[{"license":{"id":"GPL-2.0-only"}},{"license":{"id":"GPL-2.0-or-later"}},{"license":{"id":"SMAIL-GPL"}},{"license":{"name":"public-domain"}}]
Value should have at most 1 items
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1

On instance: /components/12/licenses:
[{"license":{"id":"GPL-2.0-only"}},{"license":{"id":"GPL-2.0-or-later"}},{"license":{"id":"SMAIL-GPL"}},{"license":{"name":"public-domain"}}]
Required properties ["expression"] are not present
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1/items/0

On instance: /components/12/licenses/0:
{"license":{"id":"GPL-2.0-only"}}
All values fail against the false schema
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1/additionalItems

On instance: /components/12/licenses/1:
{"license":{"id":"GPL-2.0-or-later"}}
All values fail against the false schema
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1/additionalItems

On instance: /components/12/licenses/2:
{"license":{"id":"SMAIL-GPL"}}
All values fail against the false schema
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1/additionalItems

On instance: /components/12/licenses/3:
{"license":{"name":"public-domain"}}
All values fail against the false schema
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1/items/0/additionalProperties

On instance: /components/12/licenses/0/license:
{"id":"GPL-2.0-only"}
Required properties ["name"] are not present
http://cyclonedx.org/schema/bom-1.6.schema.json#/oneOf/1

On instance: /components/12/licenses/2/license:
{"id":"SMAIL-GPL"}
Value should match one of the values specified by the enum
http://cyclonedx.org/schema/spdx.schema.json
On instance: /components/12/licenses/2/license/id:
SMAIL-GPL
Unable to validate against any JSON schemas.
BOM is not valid.

The logic basically keys on SMAIL-GPL not being a valid spdx-id, and then because of that it invalidates all other entries as failing against the schema for one reason or another ☹️ --- I think there might be some ways to clean this up, but defer to the maintainers here on what they think the best presentation of errors like this should be.

spiffcs avatar Sep 05 '25 18:09 spiffcs

The short summary is: The cli doesn't use latest version of the schema http://cyclonedx.org/schema/spdx.schema.json, and this is why the validation fails.

The schema is being provided by the underlying cyclonedx-dotnet-library here: https://github.com/CycloneDX/cyclonedx-dotnet-library/blob/main/src/CycloneDX.Core/Schemas/spdx.schema.json and it is currently at version "v1.0-3.24.0" instead of "v1.0-3.26.0". There is a PR already to update it: https://github.com/CycloneDX/cyclonedx-dotnet-library/pull/401 Once this is completed, it would then need a new release of the library and a new release of the cli (updating the library version).

With respect to the number of validation messages that are generated: It is not so easy to reduce the noise. It used to be much worse; I already improved it quite a bit. The basic problem is due to combinatorial explosion of the subschemas:

What I mean is that licenses has two forms: https://cyclonedx.org/docs/1.6/json/#components_items_licenses

EITHER (list of SPDX licenses and/or named licenses) OR (tuple of one SPDX License Expression)

Most of the messages you get because it also tries to match it against the "tuple of one SPDX License Expression" branch. In addition, each license also has two different forms (https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i0_items_license), which causes additional messages.

The logic is here: https://github.com/CycloneDX/cyclonedx-dotnet-library/blob/main/src/CycloneDX.Core/Json/Validator.cs but the validation comes from JsonSchema.Net.

I know that some validators try to figure out the most likely error message (e.g. Python's jsonschema), mostly as an option; however, this involves some heuristics.

We could try to improve the pruning, but the current pruning was the best solution I saw back then.

andreas-hilti avatar Sep 06 '25 07:09 andreas-hilti

Thanks for the summary and context on why the validation messages are the way they are @andreas-hilti that makes a ton of sense!

Definitely not right now, but Is there a chance in the future for the design to be changed where the cli attempts to use the latest version?

Basically it has the option and to look up the latest at runtime, but would fall back to the bundled version if it can't query the latest schema for what ever reason. It might smooth out some of the headaches across tools that consume/produce cyclonedx SBOM.

We love having the cli as part of our validation suite to make sure we're doing the right think in syft here but the test can get a little flakey when these license drifts or other conflicts come up 😢

spiffcs avatar Sep 07 '25 01:09 spiffcs

If you ask me, I wouldn't make this

Definitely not right now, but Is there a chance in the future for the design to be changed where the cli attempts to use the latest version?

the default behavior (for reproducibility, etc.), but as an option, why not?

I guess there are a couple of technical question to be clarified, given that JSON Validation uses a global registry (in contrast to xml): https://github.com/CycloneDX/cyclonedx-dotnet-library/blob/6bffc9a516256459d40739072301d207c0758b68/src/CycloneDX.Core/Json/Validator.cs#L40-L45 in particular to get a reasonable behavior when the library is used outside of the cli.

andreas-hilti avatar Sep 07 '25 07:09 andreas-hilti