dependency-track icon indicating copy to clipboard operation
dependency-track copied to clipboard

Add support for SPDX v3

Open stevespringett opened this issue 3 years ago • 8 comments
trafficstars

Current Behavior:

  • SPDX support was removed for technical (and other) reasons from prior versions of DT
  • SPDX currently does not describe what something is, only what something could be. This makes it extremely challenging to support SPDX reliably in DT - one of the technical reasons support was dropped

Proposed Behavior:

  • SPDX v3 will include the necessary information which properly identifies components along with component types (similar to CycloneDX)
  • Add an optional and automatic translation layer that converts SPDX into CycloneDX for ingestion into DT
  • Limit SPDX support to modern serialization formats, such as JSON, which can be independently validated
  • Ideally, this work would be based on the work of @goneall and @coderpatros

stevespringett avatar Jun 23 '22 15:06 stevespringett

Thanks @stevespringett for adding the issue.

We are just about done with our first release candidate for the 2.3 spec.

It has a new field primaryPackagePurpose which is intended to cleanly map to the component types of CycloneDX. The documentation for this can be found here.

For the identifier, there is a unique ID for the package metadata (which translates to CDX component) which is the concatenation of the Document Namespace and the SPDX ID. We decided to use Package URL's for the ID of the artifact the SPDX package refers to. This has been added as an External Ref type in 2.3. It is currently optional for compatibility reasons, but we could generate a warning or make it required when ingesting into dependency track.

Please let me know if the above is not sufficient or if there is anything else needed. There's still time to influence the SPDX 2.3 release to help with any requirements for dependency track integration.

goneall avatar Jun 23 '22 21:06 goneall

Thanks @goneall .

Unless I'm overlooking something, SPDX 2.3 still cannot state what a component is, rather, people have to use external reference identifiers which are "believed to be relevant to the Package" according to https://github.com/spdx/spdx-spec/blob/development/v2.3.1/chapters/how-to-use.md. There is no guidance on what to do when multiple identifiers of the same types are specified, or when conflicting identifiers or impossibilities are encountered.

The unique ID that was added which only supports PURL will also be a problem in that if a component only has a CPE, there's no way to represent it.

A future version of Dependency-Track will support evidence, including evidence of identity, along with where and how the evidence was obtained and the confidence of the evidence. This aligns to how many SCA products work, the OWASP SCVS BOM Maturity Model, and evidence support in CycloneDX v1.5. As of today, SPDX 2.3 falls somewhere in the middle. It cannot describe what something is, yet it also doesn't provide supporting evidence and confidence for its "beliefs". This makes supporting SPDX in Dependency-Track really challenging and was one of the technical reasons why SPDX support was previously removed.

For SPDX 2.3, I'm seeing ambiguity and overlapping definitions for the primaryPackagePurpose, some of which will make converting between SPDX and CDX more difficult. It will also make importing into Dependency-Track more difficult. I know the original intent was to aid in compatibility with CDX, however in practice, I don't think that's achieved. I'd be happy to discuss.

I also think there's some work that needs to be done with external reference categories, as they don't make a lot of sense IMO. For example, SWID is defined in the SECURITY category, yet I'm not aware of any use case that supports this today. SWID is supported by most CMDB discovery modules, such as ServiceNow, but these use cases have nothing to do with security. On the other hand, Package URL is defined in the 'Package-Manager" category and purl is used 20B times every month by Dependency-Track for security use cases. Does the category restrict what the identifier can be used for? What happens when the category is SECURITY and a purl is defined? (which I've seen many times). There's too much ambiguity here.

Props to @goneall on the new SPDX Java Library. It is 1000x better than the original. The design and code quality are outstanding. Seriously. Great job! One thing that will prevent adoption by Dependency-Track is its support for XML without having a released XML schema. There currently is none - see https://github.com/spdx/spdx-spec/tree/development/v2.3.1/schemas. Dependency-Track needs to be able to independently validate BOMs prior to processing.

IMO, there are still too many outstanding issues with SPDX 2.3. I know SPDX v3 solves some of these issues, and hopefully this feedback can be used to correct some of the other issues which may not have been previously considered, but which are important for Dependency-Track adoption.

stevespringett avatar Aug 23 '22 04:08 stevespringett

@stevespringett

Unless I'm overlooking something, SPDX 2.3 still cannot state what a component is, rather, people have to use external reference identifiers which are "believed to be relevant to the Package" according to https://github.com/spdx/spdx-spec/blob/development/v2.3.1/chapters/how-to-use.md. There is no guidance on what to do when multiple identifiers of the same types are specified, or when conflicting identifiers or impossibilities are encountered.

As mentioned in the comment above, there is a unique ID for the package metadata (which translates to CDX component) which is the concatenation of the Document Namespace and the SPDX ID. This is an absolutely unique identifier for the SPDX package (component in the CDX terminology) if the SPDX spec is followed correctly. The intent of the external identifier is to allow correlation to other identification schemes, such as CPE however perfect or imperfect they may be. Since the SPDX identifiers are unique, you can use those when translating. Let me know if I'm missing something, but I think the unique SPDX identifier satisfies the translation requirements.

For SPDX 2.3, I'm seeing ambiguity and overlapping definitions for the primaryPackagePurpose, some of which will make converting between SPDX and CDX more difficult.

The primary package purpose was designed map directly to type property of the CDX component which should make this easier. I would be happy to discuss. There is a bit of a nuance in that CDX calls a File a component type when it is really more of a property of what we call a package in SPDX. In the future, we should probably discuss these aspects during the SPDX spec update - I'll try to ping you for review in future updates so we're in sync before the release.

I also think there's some work that needs to be done with external reference categories, as they don't make a lot of sense IMO.

I would suggest posting this to the SPDX spec - all input is welcome and we always appreciate improvements to the spec. In my opinion, the overlap of different external references creates ambiguities in the real world and representing those in a data spec is accurate and not an issue with the spec.

Props to @goneall on the new SPDX Java Library.

Thanks @stevespringett ! Appreciate the comments.

One thing that will prevent adoption by Dependency-Track is its support for XML without having a released XML schema

Completely agree - There has been some progress on this front, but I'm not much of an XML schema expert so could use some help on this front. There is an issue in the SPDX spec where were are making some progress. I'll make a point of doing some additional work on the XML schema over the next couple of months.

IMO, there are still too many outstanding issues with SPDX 2.3.

Since the 2.3 spec changes were intentionally designed to ease the conversions between CDX and SPDX, I'm hoping I can change your mind on this. @coderpatros and I have written utilities to convert between the formats. Although there is some loss in fidelity, I believe there is sufficient accuracy we can use the 2.3 spec IMHO.

goneall avatar Aug 23 '22 04:08 goneall

The SPDX Tech. team has decided their SPDX 3.0 spec. sill be published canonically as a JSON-LD document, currently using a "Terse RDF Triple Language" (TTL) file format which can be found here:

  • https://github.com/spdx/spdx-3-model/blob/gh-pages/model.ttl

There are no plans to produce a JSON schema or XML schema version (as of this writing) as per direct query to the tech. team. That exercise, I was told is left to RDF modelers such as one member uses (i.e., Apache Jena). There does not appear to be any existent libraries (open or not) that provide such translation and resolution of complex namespace and management of types (let alone validation).

mrutkows avatar Feb 13 '24 19:02 mrutkows

There are some ongoing discussions in the SPDX tech community about possibly supporting a simpler JSON format of the spec complete with a JSON schema.

If there is interest in having a simpler JSON schema, I would suggest voicing your support in the serialization team meetings or in the mailing list once the RC2 version of the spec is released (hopefully soon).

If we do support the simpler JSON format, I plan to translate the canonical SHACL/OWL schema file into a JSON schema file using the SPDX Java tools.

Since JSON doesn't support the same level of semantic validation provided by SHACL/OWL we would still recommend using the canonical SHACL/OWL schema for full validation.

goneall avatar Feb 14 '24 02:02 goneall