yaml-ld
yaml-ld copied to clipboard
Polyglot Modeling
WHO: As an information architect WHAT: I want data modeling language(s) independent of technical artefacts WHY: So that:
- the language is understandable to domain experts
- it can generate a variety of required technical artefacts
- all these artefacts are kept in sync, thus lowering maintenance effort
For efficient RDF modeling, you need to define multiple related artefacts:
- ontology
- shapes (SHACL (@holgerknublauch) or SHEX (@ericprud))
- diagrams and other documentation
- JSON-LD context,
- maybe JSON-LD frames,
- JSON schema or Avro schema
- API bindings and hypertext controls (HATEOAS)
- etc
Thus, many people have expressed the desire to define a unified or "technology independent" modeling language.
- See eg https://github.com/w3c-ccg/traceability-vocab/issues/296 for a brief list of modeling framework requirements
Many communities like to have a LD expression of their data, but mostly care about defining the data with JSON schema. Efforts to marry JSON Schema with JSON-LD contexts have been undertaken in:
w3c-ccg@OR13 @nissimsan @msporny- WoT: https://www.w3.org/2019/wot/json-schema, http://www.w3.org/2019/wot/hypermedia @vcharpenay, @maximelefrancois86, María Poveda Villalón
- OAS: OpenAPI-Specification/, OAS Semantic Context in particular for eGovernment APIs @ioggstream @pchampin @giorgialodi
Examples of polyglot frameworks include the following (many but not all are YAML-based):
- LinkML (github) (@cmungall)
- FHIR (@ericprud): note, this is not YAML-based
- ShExC/ShExJ/ShExR, and now YAML representations (ShExY?) (@ericprud)
- Schema Salad (@mr-c @tetron)
- A.ML and cloudinformationmodel @danmoralesatx
- RAML (RESTful API Modeling Language), https://github.com/raml-org/raml-spec @sichvoge @krishahn
- Dragon at Uber (Joshua Shinavier and others). Eg see Dragon: Schema Integration at Uber Scale, US Semantic Technologies Symposium, March 2020, cached PDF. YAML schema examples start slide 42
- SOML (@VladimirAlexiev) used in the Ontotext Platform to generate GraphQL querying, mutations, and SHACL shapes.
- smart-data-models (@albertoabellagarcia)) (FIWARE, IUDX, SmartCities, TM forum). Example: Aircraft; See Contribution manual gslides, EU DATA SP4CE - final event - Brussels - 31 May 2024 (manufacturingdataspace-csa.eu). The single-source-of-truth is YAML. But there are systemic modeling problems: https://github.com/smart-data-models/dataModel.EnergyCIM/issues/3
- shapiro by @mathiasrichter uses SHACL for modeling, converts models to JSON-Schema, and can serve them as HTML, SHACL, JSON-Schema
- yml2vocab by w3c (@iherman, @msporny, @gkellogg), used widely by the VC community. Generates ontology, context and HTML like w3c specs with
respec(see examples: test.yml, rendered test.html; previews, security/vocabulary.yml) - Target Vocabulary Maps (@niklasl): simpler way of getting data using "Ontology A" into data using "Ontology B": maps the used terms from A to B, taking some variances such as granularity into account. See SWIB 2019: slides, video. Used in the Libris XL library system to harmonize data.
- TreeLDR (@timothee-haudebourg)
- Semantic Treehouse (@oosterheertl, @Michiel-s, @WvandenBerg) that is used as a Vocabulary Hub in data spaces.
- jargon.sh (@jargon.sh) that's used by UNECE and UNTP, eg for https://jargon.sh/user/unece/DigitalProductPassport
- OpenSemanticLab (@simontaurus, @SimonStier, @raederan), blending JSON-SCHEMA and JSON-LD as Object-Oriented Linked Data Schema (OO-LD). JSON playground, YAML playground. Pub: Linked Data Schema Repositories for Interoperable Data Spaces (2023)
YAML-LD should not take over these modeling-framework efforts, but should show how they can be used together, show archetypical examples, and maybe make a comparison.
Even if no Modeling requirements make it into any YAML-LD spec, this git repo could serve as a "meeting place" for people working on these topics, potentially resulting in some unification.
This was discussed during today's call: https://json-ld.org/minutes/2022-06-22/.
Another polyglot example: ShExC/ShExJ/ShExR. YAML representations (ShExY?) assume that json2yaml covers the subset of JSON needed by ShExJ, i.e. JSON with arrays and objects with ASCII keys.
On the topic of semantic polyglot mechanisms (i.e. not dependent upon syntactic, "irrelevant" idiosyncrasies), I'd like to add Target Vocabulary Maps as a reference. It is focused upon a simpler way of getting data using "Ontology A" into data using "Ontology B" than using OWL reasoners. This mechanism instead maps the used terms from A to B, taking some variances such as granularity into account.
The relevance here is mostly about contrast: TVMs should have nothing to do with syntax, but the form and meaning of the information expressed. Wherein, I'd suggest, lie the real and hard problems of interoperability.
My worry is that parts of the problems tackled in the efforts listed in this issue stem from a failure to separate syntactical differences (which should be about "typography" / editorial ergonomics), from semantic differences (the topic of the role, denotation and scope of identities, and different perspectives on temporality, generality, granularity). I believe we must both recognize that the latter are reasonably orthogonal to syntax — but also that it is much harder to separate them in practise (since syntaxes and tools are all we have). Therefore diligence is required to keep them distinct.
One of the most challenging things here is the (quite reasonable) motivation to package and insulate specific solutions (any given "simple REST API" is such a thing). I think JSON-LD in essence is an attempt to bridge those isolated solutions and the Semantic Web. By extension, so is LinkML (I'd say that that, through extension, is even more "tooling-oriented"). However, neither of these really attempt to unify the resulting combinatorial knowledge graphs. They "just" result in URI:s for all the terms. That is a good start (I'd say a necessity of sorts), but without the Semantic Layer Cake —or something simpler— we've actually gotten nowhere in terms of combining the graphs. The resulting "triple soup" becomes unintelligible in its proliferation of terms. That is a hard problem. Syntactic differences and contextual "walls" are just symptoms and mitigations.
Methods for guiding or restricting forms of expression may be an aid. Those are commonly realized by structures such as enumerations and schema restrictions within or upon term definitions, either tied to vocabulary terms, or application ("context" or "profiles"), or both ("application vocabularies"). I believe we need to seek a fruitful, simplified convergence here rather than to attempt to seek "independence", which actually introduces, indirected, multilayered dependence.
That, however, is reasonably not the task of YAML-LD. What I do believe is that is is up to YAML-LD not to further the divide by isolation, by which I mean it should avoid invention of solutions that only work within its particular syntax. Or, if that is too alluring (well, valuable), we must learn from that and attempt to generalize its solution beyond the syntax.
Of course, that position relies upon the assumption that a fundamental RDF-based substrate is of value, and that RDF vocabularies themselves can be practically used for interoperability. (An opposite position would be that the type system of a particular language (say Haskell) is sufficient and the actual limit beyond which no practical interoperability can be achieved. Even REST API:s will, by and large, be divided by application, mere simplified—even crass and ephemeral—surfaces for certain interaction with a system realized through its internal, isolated application complexity.)
I'd describe JSON-LD as doing purely syntactic transformations. There's the intention of mapping to something more semantic, but that's realized not by JSON-LD but by the @context author who's instructions may produce something that unifies with other SemWeb documents. I'd reserve the term "semantic transformation" for toolchains that can infer conceptual equivalence and perform both labeling and structural changes. To wit, sed does not perform semantic transformations, even if the resulting document may have more useful semantics.
This issue was discussed in today's meeting.
@ericprud -- Please edit your https://github.com/json-ld/yaml-ld/issues/19#issuecomment-1179545426 to put a code fence around @context so that GitHub user doesn't get pinged every time this issue is updated.
@ericprud -- Please edit your #19 (comment) to put a code fence around
@contextso that GitHub user doesn't get pinged every time this issue is updated.
I took care of it.
Regarding the comment of @VladimirAlexiev about data models defined at smart data models. It is true that there is a yaml version of all schemas and that this version is core for the development of many digital assets. But it is also true that the original single-source-of-truth is a json schema. Why? possibly both are valid solutions, we just started with json schema and there are many libraries for validation. (maybe there are also libraries for yaml)
@niklasl I've now read your SWIB 2019 slides and I like what you say on p23
- Paths include property chain axioms with range-restricted subproperties.
- Statement-like entities can provide direct predicates from e.g. qualified events.
This reminds me strongly of tricks described here:
- Extending OWL2 Property Constructs with OWLIM Rules Alexiev, V. Technical Report Ontotext Corp, September 2014.
Going to slide 26: whereas you "bastardize" PCA to say
schema:isbn
owl:propertyChainAxiom (
[ rdfs:subPropertyOf bf:identifiedBy ; rdfs:range bf:Isbn ]
rdf:value
)
I would represent it with a dedicated construct eg
:bf_identifiedBy__schema_isbn a ptop:PropChainType1;
ptop:premise1 bf:identifiedBy;
ptop:type1 bf:Isbn;
prop:premise2 rdf:value;
ptop:conclusion schema:isbn.
And then implement with a rule like this:
Id: PropChainType1
t <rdf:type> <ptop:PropChainType1>
t <ptop:premise1> p1
t <ptop:premise2> p2
t <ptop:type1> t1
t <ptop:conclusion> q
x p1 y
y p2 z
x <rdf:type> t1
----------------
x q z
I did plenty of GLAM work in the past, and recently applied the above constructs to a RiCO case, using PROV patterns: Rolification vs Qualification: https://github.com/ICA-EGAD/RiC-O/issues/67#issuecomment-1919383104
Since it was mentioned in the (edited) https://github.com/json-ld/yaml-ld/issues/19#issue-1255308854 by @VladimirAlexiev: In the scientific context of OpenSemanticLab (see also OpenSemanticWorld-Package-Registry) we are picking up the idea of blending JSON-SCHEMA and JSON-LD as Object-Oriented Linked Data Schema (OO-LD) The core idea is that an OO-LD document is always both a valid JSON-SCHEMA and a JSON-LD remote context ( != JSON-LD document). In this way a complete OO-LD class / schema hierarchy is consumeable by JSON-SCHEMA-only and JSON-LD-only tools while OO-LD aware tools can provide extended features on top (e.g. UI autocomplete dropdowns for string-IRI fields based e.g. on a SPARQL backend, SHACL shape or JSON-LD frame generation). The regular playground provides already an example by combining a JSON-SCHEMA-only html form generator and the JSON-LD-only JSON-LD playground. I would regard YAML(-LD) in this context merly as a human-friendly syntax synchronized with text editors by using standard YAML2JSON-roundtrips but keep all transport and storage in JSON(-LD), as the YAML playground demonstrates.
@simontaurus thanks! It is great to see new products and technologies built with LD and even more so with YAML. Would you want to use a Convenience Context (with dollar signs or without them) to avoid the need to escape @-keywords except @context, or is it not applicable in the context of OO-LD documents?
@anatoly-scherbakov: In general we want to keep keywords in 'instance' JSON-documents (=> property names in schemas) strict ^[A-z_]+[A-z0-9_]*$ to avoid escaping or replacing when mapping to other languages. This works well with aliasing, e.g.
{
"@context": {
"schema": "http://schema.org/",
"name": "schema:name",
"type": "@type"
},
"title": "Person",
"type": "object",
"properties": {
"type": {
"type": "string",
"default": "schema:Person",
},
"name": {
"type": "string",
"description": "First and Last name",
}
}
}
translates smoothly to python (pydantic) via https://github.com/koxudaxi/datamodel-code-generator:
class Person(BaseModel):
type: Optional[str] = "schema:Person"
name: Optional[str] = None
"""First and Last name"""
what would not be the case if we use @type or schema:name as property names.
UPDATE: See also python playground
I would prefere both JSON-LD and JSON-SCHEMA didn't introduce special char keywords @* and $* but since they have, there seems (currently) no way around having @ and $ in schema and context definitions since, in case of JSON-LD "Aliased keywords may not be used within a context, itself".
Defining a Convenience Context like this one in YAML-LD would help but prevent usage of generic converters like js-yaml as long this is not reflected in JSON-LD, IMHO not a good trade / likely source of misunderstanding. Ideally there would be a JSON-LD version 2.x allowing aliases within a context or even defining built-in aliases like ld_id, ld_type, ld_context, etc.
@simontaurus wrote:
The core idea is that an OO-LD document is always both a valid JSON-SCHEMA and a JSON-LD remote context
That is really neat; very powerful. Keep going!
@simontaurus thanks for the info! I am a frequent user of pydantic but haven't seen this generation tool before.
If we could define aliases usable inside contexts then writing contexts in YAML-LD would've been much more feasible. Currently, author has to live with a lot of quoting due to the @ characters, which does not create a friendly user experience.
Originally we thought we'd replace each @-keyword with a $-keyword in YAML-LD processor but we opted to use convenience contexts instead; your use case brings me to think this was the right course of action if we want to maintain compatibility with other technologies.
@simontaurus hi. In Italy, all agencies are using the https://datatracker.ietf.org/doc/draft-polli-restapi-ld-keywords/03/ to tie Json schema and jsonld.
This spec was designed to avoid some issues arising from using potentially protected keywords in json-schema.
This is done including json-ld content inside specific keywords, so that no conflicts with current or future json-schema versions arise.
@simontaurus here you can see a preliminary swagger editor that interprets these keywords and renders the associate semantic resources https://italia.github.io/swagger-editor/
@msporny wrote:
That is really neat; very powerful. Keep going!
Many thank - glad you see some potential in this approach!
@anatoly-scherbakov wrote:
I am a frequent user of pydantic but haven't seen this generation tool before
After getting pyodide running there's now also a python playground. From pydantic it's also straight forward to generate JSON- / OpenAPI-Schemas, especially via FastAPI
@ioggstream : Thanks for sharing. Great feature to have annotated OpenAPI schemas with rendering support in Swagger-UI!
Good to know that x-jsonld-* may be necessary to stay OpenAPI conform. On the downside, the documents are no longer valid JSON-LD remote contexts, but I guess it's not hard to transform between root-level @context and (nested) x-jsonld-context especially when the OpenAPI schema is autogenerated from tools like pydantic/FastAPI => could be a nice use case to generate REST-API-LD schemas from OO-LD schemas to get access to your Swagger-Toolstack and backlink the original schema via externalDocs.
Hello @VladimirAlexiev, you've got the wrong user tag after Semantic Treehouse. Commenting to help you find the right Wouter van den Berg :)
Hi @simontaurus,
After getting pyodide running there's now also a python playground. From pydantic it's also straight forward to generate JSON- / OpenAPI-Schemas, especially via FastAPI
It seems a very cool playground! I'll probably reuse part of it for LD-keywords :) Maybe we could share some library code since it seems to me that the main difference is that OO-LD just puts the @context at the top level of an object.
x-jsonld-* may be necessary to stay OpenAPI conform
The issues are:
- OAS 3.0 does not allow custom properties that do not start with
x-, and in Italy all our 12k agencies publish APIs in OAS3.0 - OAS 3.1 uses newer JSON-SCHEMA versions, which has different constrains
After discussing with JSON-SCHEMA and JSON-LD folks (see https://github.com/json-ld/json-ld.org/issues/612#issuecomment-934518103) we identified this solution.
backlink via externalDocs
I'd avoid using externalDocs for specific use cases.
@ioggstream wrote:
Maybe we could share some library code
I would very much appreciate that!
@ioggstream wrote:
seems to me that the main difference is that OO-LD just puts the
@contextat the top level of an object
Regarding the context you are totally right. But in addition we also want to provide some extension like multi-context-mapping, multilanguange support and, most important, annotation for the range of (object) properties which are only validated syntactically as IRIs in JSON-SCHEMA.
E.g. a Person works_for an Organization
"@context":
schema: http://schema.org/
works_for: schema:worksFor
type: "@type"
id: "@id"
demo: https://oo-ld.github.io/demo/
"@graph":
- id: demo:person
type: schema:Person
works_for: demo:organization
- id: demo:organization
type: schema:Organization
can be expressed with
Person.schema.json
"@context":
schema: http://schema.org/
works_for: schema:worksFor
type: "@type"
"$id": Person.schema.json
title: Person
type: object
properties:
type:
type: string
const: schema:Person
works_for:
type: string
x-oold-range:
allOf:
"$ref": Organization.schema.json
description: IRI pointing to an instance of schema:Organization
Organization.schema.json
"@context":
schema: http://schema.org/
type: "@type"
"$id": Organization.schema.json
title: Organization
type: object
properties:
type:
type: string
const: schema:Organization
which would also translate to OWL
schema:Person rdf:type owl:Class ;
rdfs:subClassOf [
rdf:type owl:Restriction ;
owl:onProperty schema:works_for ;
owl:someValuesFrom schema:Organization ;
] .
and SHACL
schema:Person Shape
sh:NodeShape ;
sh:targetClass ex:Person ;
sh:property [
sh:path schema:works_for ;
sh:class schema:Organization ;
] ;
but would allow to extend existing JSON-SCHEMA based tools to make use of this annoations, e.g. in order to provide autogenerated autocomplete fields with the option to create a missing instance ad-hoc with the specified schema or to patch pythons __getattribute__ to fetch and return a remote object instead of the IRI (see early work at oold-python)
Had a very interesting chat with @Michiel-s, @WvandenBerg (TNO), @robert-david, @skarampatakis, @amelak9 (SWC), @ivanov-petar (Ontotext) on collaboration for Semantic Vocabulary Hubs for Dataspaces. Which includes ability to capture and interlink schemas in a variety of formats, and federation between hubs.