rdflib-jsonld icon indicating copy to clipboard operation
rdflib-jsonld copied to clipboard

Schema.org moving from HTTP Content Negotiation to JSON-LD 1.1 "Link:" header for context file

Open danbri opened this issue 5 years ago • 9 comments

This happened faster than planned due to a DOS attack this week, details in https://github.com/schemaorg/schemaorg/issues/2578#issuecomment-632227864

Schema.org no longer publishes a JSON-LD context file using HTTP content negotiation. Our homepage URL always returns HTML. This affects the parsing of all JSON-LD that expects to get a context definition from URLs "http://schema.org", "https://schema.org", "http://schema.org/", "https://schema.org/".

The URL of our context file is https://schema.org/docs/jsonldcontext.jsonld

We will shortly update the site to declare this URL via a Link header (see above issue for details).

I am filing this issue

  • Firstly to give you background knowledge in case people report JSON-LD parsing problems here
  • To encourage implementation of the JSON-LD 1.1 "Link" header discovery mechanism which AFAIK from my quick tests isn't yet supported in RDFLib
  • To encourage discussion of caching / robustness, since there is no guarantee that this file will remain accessible 24x7 indefinitely.

danbri avatar May 21 '20 17:05 danbri

The main Schema.org site should have the headers discussed now, i.e.

  • access-control-allow-credentials: true
  • access-control-allow-headers: Accept
  • access-control-allow-origin: *
  • link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

danbri avatar May 21 '20 17:05 danbri

@hsolbrig can you suggest a workaround, at least for short term use? Can we pass in the context when invoking parser (by URL or by content?) /cc @Gnomus042

danbri avatar Jul 02 '20 14:07 danbri

Is there no way to do this without requiring a custom HTTP header? Why is that part of the data specified out-of-band from the rest of the document?

(edit) Static files (with no HTTP server configuration dependency) are most scalable and archivable.

westurner avatar Sep 18 '20 01:09 westurner

@rob-metalinkage, is this going to cause problems for JSON- > JSON-LD expansion due to the separate Context?

nicholascar avatar Sep 18 '20 11:09 nicholascar

@danbri, @westurner, @hsolbrig RDFlib maintainers are assembling volunteers to complete this tools' JSON-LD 1.1 implementation and then to merge it into RDFlib core. That should make it easier for all to just "do" JSON-LD with RDFlib.

nicholascar avatar Sep 18 '20 11:09 nicholascar

@nicholascar I dont think it causes any extra problems, as using just a model namespace to perform JSON->JSON-LD expansion is unsafe anyway.

The patterns appearing to be in the wild seems to be:

Data model = X context URI = <some URL similar to X>.json

i.e. there is no way to discover for a model X the relevant context file.

Or the requirement to perform content negotation is based on a model

Datamodel = X Context = X (Accept ld+json)

this is being taken off the table as a bad idea according to this issue, but it has a deeper issue IMHO that if your data model is described in OWL , then ld+json should be the data model serialised as JSON-LD, not necessarily a JSON-LD context for the model.

The options for canonical mechanisms to discover the actual URL for a context seems to be: a) <Model X> returns Link header for alternates b) <Model X> supports a Profile "alt" which can be accessed for by either header or a URL parameter<X?_profile=jsoncontext> where the profile jsoncontext is registered and well-known. (dx-prof-conneg)

if dx-prof-conneg supports the same Link syntax and if a resource chooses to embed the Link headers for all the available profiles and serialisations from the "alt" view by default the two approaches are consistent I think.

I'd always choose the latter, as JSON context is not the only resource I'd want to be able to discover about a model. JSON-schema is also valuable, and SHACL and HTML and maybe other forms.

rob-metalinkage avatar Sep 18 '20 14:09 rob-metalinkage

Maybe I'm misunderstanding? From https://www.w3.org/TR/json-ld11/#the-context ::

Contexts can either be directly embedded into the document (an embedded context) or be referenced using a URL. Assuming the context document in the previous example can be retrieved at https://json-ld.org/contexts/person.jsonld, it can be referenced by adding a single line and allows a JSON-LD document to be expressed much more concisely as shown in the example below:

{
 "@context": "https://json-ld.org/contexts/person.jsonld",

westurner avatar Sep 18 '20 15:09 westurner

@westurner you are right it doesnt need necessarily need a custom header, but there are a couple of things that need care here:

  1. the agent that is "adding a single line" somehow has to know the URL "https://json-ld.org/contexts/person.jsonld"

we can say its all client code to tell RDF lib exactly what to include and maybe not think about this - but this issue is about other approaches such as trying to resolve namespaces such as schema.org and getting a context.

  1. contexts may include other contexts - so the behaviour needs to be explicit in terms of exactly how to handle potential conflicts (prefix strings bound to different URIs) and default namespaces (@value, @base) - having been exploring this I find the JSON-LD documentation extremely hard to follow and lacking basic examples, and RDFLib is silent. IMHO RDFlib should encapsulate and explain basic practices here without needing interpretation of JSON-LD specification to get started.

  2. there seem to be quite a lot of ways to bundle a set of object descriptions in JSON-LD - including arrays, @graph constructs, container objects etc. Probably the JSON-LD serialiser needs to be able to handle these if we want to deliver a a serialisation for use in a specific context - such as to meet an API payload requirement. The JSON-LD framing spec makes this clear - see #95

rob-metalinkage avatar Sep 21 '20 01:09 rob-metalinkage

I think the following code (failing to load the schema.org context) is linked to the present problem, but doesn't understand the workaround `from rdflib import Graph, plugin from rdflib.serializer import Serializer

jsonldSample = """ { "@context": "https://schema.org", "@type": "LocalBusiness", "name": "La Tour Eiffel", "address": { "@type": "PostalAddress", "addressLocality": "Paris", "addressRegion": "75007", "streetAddress": "Champ de Mars, 5 Avenue Anatole France" }, "description": "Monument emblématique de Paris, la tour Eiffel est une tour de fer puddlé de 324 mètres de hauteur construite par Gustave Eiffel à l’occasion de l’Exposition Universelle de 1889 et qui célébrait le premier centenaire de la Révolution française.", "url": "https://www.toureiffel.paris", "image": "https://www.toureiffel.paris/sites/default/files/2017-10/monument-landing-header-bg_0.jpg", "pricerange": "de 2,5 à 25 euros", "telephone": "08 92 70 12 39" } """

g = Graph().parse(data=jsonldSample, format='json-ld') print(g.serialize(format='json-ld', indent=4)) print(g.serialize(format='nt', indent=4))`

datamusee avatar Dec 05 '21 09:12 datamusee