spdx-spec icon indicating copy to clipboard operation
spdx-spec copied to clipboard

JSON-LD identifiers are not dereferenceable

Open berezovskyi opened this issue 5 months ago • 16 comments

The following request fails with 404:

curl -X GET 'https://spdx.org/rdf/Core/' \
  --header 'Accept: application/ld+json;q=0.9, text/turtle;q=0.8, application/n-triples;q=0.5, application/rdf+xml;q=0.4'

It also fails with simpler Accept values like application/ld+json or text/html and blank values.

This would interfere with JSON-LD aware tooling that's not 100% hardcoded but tries to actually follow the metadata. Would it be possible to fix the deployment, please?

berezovskyi avatar Jul 14 '25 19:07 berezovskyi

On our self-hosted infra, we use a tiny content negotiation tool: https://github.com/oslc-op/website-content-negotiation/blob/master/connego/main.go

For Apache servers, a similar config is https://github.com/perma-id/w3id.org/blob/d1ea6b6593c8f1573a3b61a82c91387d629525f2/gaia-x/.htaccess

I see that you are using Cloudfront/S3, it could be a bit tricky. LLMs are suggesting Lambda@Edge code like this:

'use strict';
exports.handler = async (event) => {
    const request = event.Records[0].cf.request;
    const acceptHeader = request.headers['accept'] ? request.headers['accept'][0].value : '';
    // Default file
    let newUri = '/ontology.html';
    if (acceptHeader.includes('application/rdf+xml')) {
        newUri = '/ontology.owl';
    } else if (acceptHeader.includes('application/ld+json')) {
        newUri = '/ontology.jsonld';
    } else if (acceptHeader.includes('text/turtle')) {
        newUri = '/ontology.ttl';
    }
    request.uri = newUri;
    return request;
};

Have not tested the above, but a blog post seem to use a request.uri pattern.

P.S. Also, it would be easiest to make namespaces not @prefix ns1: <https://spdx.org/rdf/3.0.1/terms/Core/> . but @prefix ns1: <https://spdx.org/rdf/3.0.1/terms/Core#> . and serve one file per namespace (plus, when a client requests HTML, you can give them a redirect to a page with the corresponding anchors for a better UX), but I guess it's a bit late for the breaking changes like this.

berezovskyi avatar Jul 14 '25 19:07 berezovskyi

@goneall should we move this to spdx-spec repo and label it as "website improvement"? As it is related to HTTP redirection/resolution.

bact avatar Aug 12 '25 16:08 bact

Moving to the spec repo as suggested by @bact - we should be able to fix this on the website. The content is hosted on AWS S3 buckets - I'll need to research how to implement the proper redirects since we don't really have a proper webserver front-ending the content.

goneall avatar Aug 12 '25 17:08 goneall

I don't think a URL without any version number should resolve to anything. What is supposed to point to?

zvr avatar Aug 12 '25 20:08 zvr

From SPDX tech call on 19 August 2025:

Need at least a major version in the URL - but the major version will point to the most recent minor version (e.g. spdx-3 will point to spdx 3.1 once 3.1 is released). We will always have the minor version URLs point to the most recent patch version. We will not use the patch versions in the URLs.

goneall avatar Aug 19 '25 16:08 goneall

@goneall thank you for looking into the issue and for posting an update.

I have no opinion to offer on the versioning and version in the URIs.

As far as I can see under https://spdx.github.io/spdx-spec/v3.0/rdf/spdx-context.jsonld, all URIs seem to have the patch version in them and none are online. For example, https://spdx.org/rdf/3.0.1/terms/AI/autonomyType is 404. This means that the LD in JSON-LD is not quite valid. See https://www.w3.org/DesignIssues/LinkedData for more info.

If I may offer an example of JSON-LD URIs done right (in my humble opinion - only concerning the availability, not in any way suggesting an approach to versioning):

http://open-services.net/ns/rm#Requirement is a URI (not merely a URL) for a Requirement entity type in the OSLC standard.

When you open that URI as a URL in a browser, you get redirected to an HTML page of the spec and jump to the correct anchor:

Image

When you request JSON-LD for the same URI, you get a valid JSON-LD document where a resource with the exact id http://open-services.net/ns/rm#Requirement is present:

Image

P.S. Just tried without the patch version - https://spdx.org/rdf/3.0/terms/AI/autonomyType is still not resolvable.

berezovskyi avatar Aug 19 '25 19:08 berezovskyi

@berezovskyi you are correct that URIs are not currently resolvable.

@goneall wrote the outcome of today's tech call, where the issue was discussed and the decisions were recorded. Please note that he clearly stated "the major version will point", "We will always have", etc. [emphasis added]

This has not been implemented yet (in the 4 hour after the call); it was added to our list of things to do.

After publishing the specification in different formats (HTML, PDF, etc.), we will also publish the RDF ontologies. Take a look at how we had done it in the past for SPDXv2: for example, the License definition.

No worries, it will appear for SPDXv3 as well.

zvr avatar Aug 19 '25 20:08 zvr

Thanks @zvr, I understand the work is yet to be done for v3.

Small note regarding your v2 example: http://spdx.org/rdf/terms#License fails to resolve to a JSON-LD where a resource with the URI http://spdx.org/rdf/terms#License is present:

$ curl -X GET 'http://spdx.org/rdf/terms#License' -L --header 'Accept: application/ld+json;q=0.9, text/turtle;q=0.8, application/n-triples;q=0.5, application/rdf+xml;q=0.4' | head

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>SPDX 2.3</title>
    <style>
      dl{background:#FFF;width:100%;border:none;margin-top:0;margin-bottom:0;padding-top:0;padding-bottom:0;}
      body{font-family:sans-serif;color:#000;background:#FFF;background-position:top left;background-attachment:fixed;background-repeat:no-repeat;text-align:justify;margin:0;padding:2em 1em 2em 70px;}
      :link{color:#00C;background:transparent;}
      :visited{color:#609;background:transparent;}
      a:active{color:#C00;background:transparent;}

berezovskyi avatar Aug 20 '25 09:08 berezovskyi

ref: Best Practice Recipes for Publishing RDF Vocabularies https://www.w3.org/TR/swbp-vocab-pub/

bact avatar Aug 20 '25 13:08 bact

I attempted to just copy over the gh-pages model HTML files from https://spdx.github.io/ - however, there were too many relative references it didn't work.

I now have the ability to copy / sync directories up to the website.

I'm thinking the best approach would be to modify the spec-parser to generate pages that redirect to the absolute URLs on gh-pages. The current redirect generation uses relative paths, which won't work. Once the spec-parser is updated, I can run it and push pages up to S3.

After we verify this works manually, we can use the AWS s3 sync github action to automate this.

goneall avatar Aug 29 '25 17:08 goneall

I'm thinking the best approach would be to modify the spec-parser to generate pages that redirect to the absolute URLs on gh-pages.

@bact @zvr @ilans - Let me know if you agree with the approach suggested above. I can take a stab at updating the spec-parser and create a pull request if you like.

goneall avatar Aug 29 '25 17:08 goneall

Agree on the approach to do it in spec-parser as it is where we have the knowledge of the actual model (no need to do heuristic guesswork by looking at paths)

bact avatar Aug 29 '25 20:08 bact

@goneall I'm sorry, but I don't understand what the suggested approach is -- or even what the end result that we want to achieve is.

Are we talking about what would be at the end of a URL like https://spdx.org/rdf/3.0/terms/Core/Element ?

If yes, I strongly believe that this should be generated by an RDF-ontology-to-web-pages tool that would read our ontology, as we were doing in SPDXv2 producing https://spdx.org/rdf/ontology/spdx-2-2-1. Are you saying that spec-parser should be enhanced to also produce this?

zvr avatar Aug 29 '25 20:08 zvr

Are we talking about what would be at the end of a URL like https://spdx.org/rdf/3.0/terms/Core/Element ?

@zvr - yes

Are you saying that spec-parser should be enhanced to also produce this?

That is my suggestion. A separate tool is also fine - whichever one is the most supportable and the path of least resistance IMHO. Since you've worked on both approaches, what do you suggest?

Whichever tool we use, I'd like to hook it into the spdx-spec CI/CD pipeline so we don't have to manually update the website once it is working.

goneall avatar Aug 30 '25 16:08 goneall

I strongly believe that, since these are the endpoints of our ontology, it would be better to have an ontology-related tool generate them. We should be feeding it the generated ontology and it would produce the corresponding "documentation" pages.

This way we will have another check that what we produce is actually what we meant.

Of course I completely agree with the automated setup remark. Although this will not actually be CI/CD -- as in: not "continuous", but on release only.

zvr avatar Sep 01 '25 10:09 zvr

@zvr i also mentioned ontology doc generators like LODE/PyLODE. I personally prefer single-page references. They can be used for complex ont8logies like these for rail

  • https://data-interop.era.europa.eu/era-vocabulary/
  • https://data-interop.era.europa.eu/era-vocabulary/rinf-appGuide/

However, having 2 versions of the documentation will be confusing and even frustrating for users. They will always wonder which is the authoritative version.

I raised a similar problem to ERA: those two above talk about nearly the same thing, so why 2 versions?

Are you prepared to give up the MkDocs documentation and switch to a generated reference? I think that no. So we can start experimenting with LODE and the like, but the final decision is more difficult.

VladimirAlexiev avatar Nov 18 '25 04:11 VladimirAlexiev