spdx-spec
spdx-spec copied to clipboard
Persistent URIs for the RDF
Dear community,
there is a serious backwards incompatility issue with the RDF version. Having version numbers in namespaces (i.e. ns1: https://spdx.org/rdf/3.0.1/terms/Core/ ) means that all implementations have to revise the URIs and internal data exchanges if a new version of SPDX is published.
In version 2.3 there was a versionless URI present: https://spdx.org/rdf/terms/#Checksum Hower that does not reflect the latest version.
Can you consider to re-establish the versionless version? An include in your version mangagement an approach with PURIs, like it was in version 2.3. More-over in the latest publication on https://spdx.github.io/spdx-spec/v2.3/ the RDF version https://spdx.org/rdf/terms/ cannot be found.
Because of this issue https://github.com/SEMICeu/DCAT-AP/issues/402 I raised my question.
kr,
Bert
In addition, please consider also the difference between classes en properties and instances.µ
E.g. http://spdx.org/rdf/terms#checksumAlgorithm_sha1 is an instance of a Cheksum Algorithm.
Any implementation that uses that internally as identifier has to update now to
https://spdx.org/rdf/3.0.1/terms/Core/HashAlgorithm/sha1.
This is not backwards compatible.
It is recommended to have for instances a seperate version management from the data model, because one can add also new algorithms to the class Checksum Algorithm.
If the model has owl:sameAs, will that resolve the concern? (partly or entirely)
Also, if the version number in the IRI is changing less often will it help? (For example, having it like /3/ or /3.0/ instead of /3.0.1/? Or if the IRI is stationary - say, if A has /3.0.1/ it will be /3.0.1/ forever, even the entire spec is now /3.4/. The new elements introduced between 3.0.1 and 3.4 will have other IRIs based on when it got introduced?). I think it is quite late in the process but more feedback will definitely help us be more informed.
It certainly helps to add these mappings (owl:sameAs or skos:Match*) into the RDF.
I know it is not easily and it may impact the publication flow. But it is a materialisation of the choices on how you are maintaining backwards compatibility.
This is always the challenge with RDF: it tries to bind the human specification with the machine used specification. And therefore it is sensitive to URI changes with no semantical impact. Eg. if a new release of SPDX 3.0.2 is created because of one corrects a misspelling of a contributer's name, then this is fine from a document version perspective. But aslo changing then the versions creates a serious impact, while nothing has changed in that file.
I think by design, there will be incompatibilities between 2 and 3. There is a documentation for the migration. https://github.com/spdx/using/blob/main/docs%2Fdiffs-from-previous-editions.md
But incompatibilities within 3 is probably undesirable.
The incompatibility between SPDXv2 and SPDXv3 is by design: we had to make breaking changes, that's why we moved to a new major release number. So http://spdx.org/rdf/terms#checksumAlgorithm_sha1 and https://spdx.org/rdf/3.0.1/terms/Core/HashAlgorithm/sha1 are definitely different things (they don't even have the same type). There is no backwards compatibility between v3 and v2.
When you say "Any implementation that uses that internally as identifier has to update", of course it has. SPDXv3 is a very different standard and any implementation has to be changed significantly in order to implement it. In general, you should only be using RDF items from the version you are processing.
For future releases to SPDXv3, there will be RDF mechanisms to support compatibility, where it makes sense and is valid.
@zvr thanks for reassuring the stability and for providing an compatability.
Can you point me to the rules when a new release is created? Or is this not yet determined?
Just a reflection (this is not any objection to the model change): https://github.com/spdx/using/blob/main/docs%2Fdiffs-from-previous-editions.md#checksum-algorithm indicates a name change. In https://github.com/spdx/using/blob/main/docs%2Fdiffs-from-previous-editions.md#a2-differences-between-v23-and-v222- the changelog mentions added hash algorithms ... All these things indicate that they are connected and somehow fill the same need. Yes they are RDF wise different entities but if one looks in both specs then the conclusion is that these concepts are connected. That is the reason why I speak of an compability issue. I have no problem with (breaking) changes as such.
I am concerned that if version numbers appear in URIs as identifiers, depending specifications are always have to update the URIs often for no benefit. I find it strange that the versionless namespace http://spdx.org/rdf/terms seems to be the version v2. While I would expected that in that namespace also v3 would be implemented. Maybe that is a wrong expectation. If that is the case I would recommend to make a note in the versionless namespace to explain the intended use of the RDF.
@bertvannuffelen please see also this issue as there was a discussion about semantic information between SDPX 2 and SPDX 3:
- https://github.com/spdx/spdx-spec/issues/970
@bertvannuffelen There is a new SPDX crypto-algorithms work group that might be of your interest.
Its repo is at https://github.com/spdx/crypto-algorithms and they intend to create and maintain a list of cryptographic algorithms and their characteristics -- including their IDs.
The idea is similar to SPDX License Identifiers, like Apache-2.0 and AGPL-3.0-or-later and their corresponding URIs, but for cryptographic algorithms. These URIs are independent from SPDX SBOM specification versions.
The first meeting will be held on 2025-05-07, please see this announcement: https://lists.spdx.org/g/spdx/message/1983
I think this has been sorted already, or there's another issue tracking it now. @bertvannuffelen - if there is still something outstanding, please open a new issue, and we'll track it there.
Moving this issue discussion over to https://github.com/spdx/spdx-3-model/issues/1146