titanium-json-ld icon indicating copy to clipboard operation
titanium-json-ld copied to clipboard

Expose the prefixes found in the top level @context, including remote @context.

Open afs opened this issue 4 years ago • 10 comments

Prefixes have no standing in the RDF data model but they are convenient for display of URIs.

Describe the solution you'd like Expose the compact URI prefix mapping from the top-level @context, maybe a method RdfDataset.prefixes() that returns a Map<String, String>. This would be limited to the prefixes from the top level @context, the active context in-scope at the end of parsing the top level JSON after any nested local context have dropped out-of-scope.

Describe alternatives you've considered Secondary parsing at the JSON level of the JSON Document (this is what Jena v4.2.0 does). This does not included remote @context as it would require re-downloading the URL or interacting with any context cache.

Jena also requires the prefix URI to end in "/", "#" or ":" and Jena includes @vocab as prefix "". There are pragmatic Jena decisions that could be applied to the Map returned by Titanium.

Additional context This came up as part of JENA-2187.

afs avatar Oct 28 '21 09:10 afs

Hi @afs, thank you for reporting that. Please help me understand the issue in order to prepare test cases.

Do I understand it right that the goal is to generate RDF Turtle from a given JSON-LD input?

The JSON-LD to RDF algorithm expands an input and the expanded input (all prefixes lost after this step) is converted into node map. So I'm thinking that maybe we could somehow utilize a compaction algorithm to get prefixed output, or just the prefixes.

filip26 avatar Oct 28 '21 11:10 filip26

Hi @filip26,

Turtle output is one use; there are several different Turtle output formats from "pretty" to a one quad-one line form which is "N-Quads+prefixes". Output does not happen when the JSON-LD is read in - the steps are read in, store, (later) write out.

Other uses include converting URIs to convenient string for UI display is another. In Jena, the dataset is the storage unit and it carries with it some prefixes.

The prefixes normally come from the files parser to build the dataset.

The process of going from Titanium to Jena is:

private void read(Document document, StreamRDF output, Context context) throws Exception {
        // JSON-LD to RDF
        RdfDataset dataset = JsonLd.toRdf(document).get();
        extractPrefixes(document, output::prefix);
        JenaTitanium.convert(dataset, output);
    }

https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/lang/LangJSONLD11.java

StreamRDF is the abstraction for sending parser output.

  • get the Titanium dataset
  • find prefixes and send to output
  • convert the list of Titanium RdfNQuad to Jena Quad and send to output.

output is typically writing into a Jena DatasetGraph - the storage abstraction.

DatasetGraph has a method prefixes() to return the prefixes carried by the dataset.

For:

{
    "@context": {
	"@version": 1.1,
	"foaf" : "http://xmlns.com/foaf/0.1/",
	"skos" : "http://www.w3.org/2004/02/skos/core#"
    }
}

I was hoping to have RdfDataset provide a map "foaf" -> "http://xmlns.com/foaf/0.1/" , "skos" -> "http://www.w3.org/2004/02/skos/core#".

Conversion between systems: https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/system/JenaTitanium.java

afs avatar Oct 28 '21 14:10 afs

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

github-actions[bot] avatar Nov 28 '21 02:11 github-actions[bot]

How to deal with conflicting prefixes?

e.g.

{
  "@context": {
    "name": "http://example.com/person#name",
    "details": "http://example.com/person#details"
  },
  "name": "Markus Lanthaler",
  "details": {
    "@context": {
      "name": "http://example.com/organization#name"
    },
    "name": "Graz University of Technology"
  }
}

converted into n-quads

_:b0 <http://example.com/person#details> _:b1 .
_:b0 <http://example.com/person#name> "Markus Lanthaler" .
_:b1 <http://example.com/organization#name> "Graz University of Technology" .

What keys should contain the prefix map?

filip26 avatar Dec 12 '21 22:12 filip26

"name": "http://example.com/person#name" isn't really a prefix - it's a short name for a URI. Prefixes appear in Turtle as prefix:localName which is more like: "person": "http://example.com/person#" and then person:name

Those can be nested as well so there is a decision point here. There isn't a wrong answer.

RDF/XML can have nested xml namespaces declarations (the XML equivalent of prefixes). It is quite unusual to see nested XML namespaces in RDF/XML - I think they would be more common in JSON-LD.

JSON is slightly different to XML because XML is parsed in encounter order and JSON is a map.

Possibility 1: ignoring the inner @context and only expose the document-wide declarations. Possibility 2: slightly more complicated is "put in as nested - outer overrides inner"

It probably makes sense for the outer, document definition to be in the final outcome.

HTH

afs avatar Dec 13 '21 10:12 afs

if the given example should produce prefix map like this one:

{ 
  "person":  "http://example.com/person#", 
  "organization":  "http://example.com/organization#"
}

then we have to develop an algorithm for extracting and naming prefixes from JSON-LD context. Perhaps, we could start with a map of well known prefixes (foaf, skos, ...).

The other options is to generate prefix map from N-Quads using a part of URL as prefix name.

filip26 avatar Dec 14 '21 14:12 filip26

Just an aside note: from another point of view; as I understand prefixes are about readability. Thus in some cases it would be more beneficial to a consumer to provide its own list of well known prefixes in order to get an easily readable output.

filip26 avatar Dec 14 '21 14:12 filip26

Yes. The user can add them to the Jena graph for example, or even read a Turtle file which only has prefixes. This happens when loading N-triples - no prefixes, but common for large database dumps - and the user wants to get some nicer output.

afs avatar Dec 14 '21 15:12 afs

I'm preparing a low level JsonLdProcessor API that will allow you to grab a context or/and optimize processing. Target version is 1.3.0

filip26 avatar Dec 19 '21 19:12 filip26

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

github-actions[bot] avatar Jan 19 '22 02:01 github-actions[bot]

V2 has been canceled because of lack of funding.

filip26 avatar Jun 20 '24 13:06 filip26

Sad to hear that v2 is cancelled.

afs avatar Jun 20 '24 13:06 afs

@afs I'm sorry, but I have no other option. I hear Titanium has millions production installations in total from various companies, but none is willing to pay a few $ back.

filip26 avatar Jun 20 '24 13:06 filip26

I'm also sorry to hear that v2 has been canceled.

hmottestad avatar Jun 22 '24 15:06 hmottestad