rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

JSON-LD 1.1 support, switch to titanium-jsonld?

Open VladimirAlexiev opened this issue 3 years ago • 12 comments

JSON-LD becomes more and more important, especially for Distributed Identifiers, Verifiable Credentials, IoT, etc.

Some initiatives are using JSON Schema to specify their JSONLD payload, which requires the ability to produce (write out) very precise JSONLD.

An examination of "JSON-LD" issues here shows a number if stalled issues and some bugs.

Some of them are due to this project (eg unable to specify context while writing/compacting), others are due to the underlying jsonld-java.

In particular, it's unclear whether jsonld-java will support 1.1. Two crucial 1.1 features are Framing and Scoped contexts.

I find https://github.com/jsonld-java/jsonld-java/pull/284#issuecomment-653521148 especially telling. Conformance test percentages are very low .

There's a suggestion to switch to https://github.com/filip26/titanium-json-ld. Its conformance test percentages are nearly perfect https://w3c.github.io/json-ld-api/reports/#subj_Titanium_JSON_LD_Java . A guy who seems to know stuff about JSONLD recommends it https://github.com/w3c/vc-data-model/issues/843#issuecomment-1024347690.

But a later comment states that jsonld-java is significantly more performant.

Please comment:

  • is improved JSON-LD important for you, in particular providing a context in write?
  • is JSON-LD 1.1 important for you?
  • should rdf4j switch to titanium-jsonld?

Cc @jeenbroekstra @JervenBolleman @fsteeg @ansell

VladimirAlexiev avatar Feb 01 '22 00:02 VladimirAlexiev

jsonld-java also does not have canonization (URDNA2015) https://github.com/jsonld-java/jsonld-java/issues/249, which is important for crypto signing apps. Titanium has it (as an extension): https://github.com/filip26/titanium-json-ld#extensions

https://github.com/kbss-cvut/jb4jsonld/issues/37 also considers switching to titanium.

VladimirAlexiev avatar Feb 01 '22 00:02 VladimirAlexiev

I haven't had any bandwidth to support JSONLD-Java and the 1.1 features have not been added, so no qualms with switching to a library that has 1.1 features implemented.

ansell avatar Feb 01 '22 01:02 ansell

Would it be an option to be somewhat able to switch between both ? I.e. there could be a second rio module for the titanium json-ld library as long as they aren't included both at the same time, AND if some refactoring on the way RDF4J sets option part is done (e.g. #1755), it might provide a gentle upgrade path...

barthanssens avatar Feb 01 '22 18:02 barthanssens

@barthanssens That's an option, as soon as the pros and cons of each alternative are clearly described.

On the topic of Parser performance (direction JSONLD->RDF):

  • https://github.com/filip26/titanium-json-ld/issues/184: on a 10Mb file, titanium achieves 3k/sec and jsonld-java achieves 8k/sec
  • https://github.com/umbreak/jsonld-benchmarks: a more systematic benchmark shows titanium to be 10x slower than jsonld-java
  • https://github.com/filip26/titanium-json-ld/issues/209: the benchmark was rerun with the latest version of titanium

Streaming may improve performance by significantly reducing required memory

  • https://github.com/eclipse/rdf4j/issues/2840 uses NDJSONLD to implement streaming (on top of jsonld-java?) but hasn't been benchmarked
  • https://github.com/rubensworks/jsonld-streaming-parser.js, https://github.com/rubensworks/jsonld-streaming-serializer.js are streaming implementations in JS using https://w3c.github.io/json-ld-streaming/. That spec says things like "put @context first in file" and "put @id first in object". So it doesn't rely on newline markers for streaming, but on a slightly restricted JSON structure.
  • https://github.com/rubensworks/jsonld-streaming-parser.js/issues/82: asked the author to benchmark
  • https://github.com/filip26/titanium-json-ld/issues/184 explains that it'd be hard to put streaming into titanium because that'd be a big architectural change

VladimirAlexiev avatar Feb 02 '22 07:02 VladimirAlexiev

Jena has integrated Titanium to a large degree: https://issues.apache.org/jira/browse/JENA-1948.

  • reading is done
  • writing is in progress: https://issues.apache.org/jira/browse/JENA-2153

    09/Nov/21 Jena 4.2.0: Basic writing JSON-LD 1.1 is provided but no configurability such as exposing framing.

Update on https://github.com/umbreak/jsonld-benchmarks (great news!):

The Json-LD Java implementation is ~ 4.6 times faster in average than Titanium. In the current state (02.04.2022), the Titanium library is 2x faster than in its initial state (03.12.2020).

VladimirAlexiev avatar Apr 07 '22 09:04 VladimirAlexiev

https://github.com/json-ld/yaml-ld/issues/20#issuecomment-1180180856 has some info on JSON-LD 1.1 conformance, including a summary table. image

Conformance leaders: Titanium (Java), JSON::LD (Ruby), PyLD (Python), jsonld.js (JavaScript)

VladimirAlexiev avatar Jul 11 '22 11:07 VladimirAlexiev

We need some request headers (or criteria) to decide when fetching JSON-LD data from the repo:

  • which JSON-LD version to return,
  • which library to use,
  • which context and frame to use. This is not very clear to me, so I asked https://github.com/w3c/json-ld-framing/issues/133.

This goes beyond jsonld 1.1 support and back to 1.0 support:

  • the currently used library jsonld-java can use any given context (right @ansell ?)
  • but there's no way in rdf4j to ask for a specific context (right @abrokenjester ?)

VladimirAlexiev avatar Aug 17 '22 12:08 VladimirAlexiev

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

amivanoff avatar Jul 19 '23 23:07 amivanoff

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

Which implementations are you comparing? And what are you comparing?

hmottestad avatar Jul 20 '23 07:07 hmottestad

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

Which implementations are you comparing? And what are you comparing?

It's the same test like in https://github.com/eclipse/rdf4j/issues/3654#issuecomment-1091454538

Test https://github.com/umbreak/jsonld-benchmarks but with newer Latest titanium-jsonld 1.3.2 from March 2023 and latest JSONLD-Java 0.13.4 from December 2021.

  • In average Json-LD Java is still ~17% faster than Titanium, but not several times faster.
  • In one test Titanium outperform Json-LD by 17%

amivanoff avatar Jul 20 '23 22:07 amivanoff

I'm working on contributing some performance optimisations to Titanium JSON-LD. Based on my single benchmark file with 600 000 triples I've currently managed to improve JSON-LD to RDF conversion by 3x, expanding by almost 2x and flattening by 4x.

hmottestad avatar Aug 01 '23 20:08 hmottestad

Impressive !

barthanssens avatar Aug 01 '23 21:08 barthanssens