yaml-ld
yaml-ld copied to clipboard
Define anchor usage in yaml-ld
As an json-ld editor … WHO I want to use yaml anchors … WHAT So that I can easily reuse content … WHY
Note
The specification should define:
- when it is legitimate to use anchors
- which are the expectation on anchor usage (e.g. do they represent a specific JSON-LD node or they can just be used to represent content?)
- are there any constraint on anchor usage? (e.g. the representation graph MAY / MUST NOT be a cyclic graph...)
example 1
---
- "@id": &homer http://example.org/#homer # Anchor the homer url
http://example.com/vocab#name:
- "@value": Homer
- "@id": http://example.org/#bart
http://example.com/vocab#name:
- "@value": Bart
http://example.com/vocab#parent:
- "@id": *homer # reuse the anchor instead of re-typing the homer url
- "@id": http://example.org/#lisa
http://example.com/vocab#name:
- "@value": Lisa
http://example.com/vocab#parent:
- "@id": *homer
example 2
Using anchor and alias nodes https://gist.github.com/ioggstream/31f3226fa9976b3baf0800f44bc19c98
- Example 2 is from the d3fend.mitre.org cybersecurity ontology.
- YAML spec: https://yaml.org/spec/1.2.2/#anchors-and-aliases
- Anchors and Aliases can represent non-tree graph structures, whereas JSON is a tree
- The above are a bit untypical examples of reusing small fragments of YAML. The typical example is reusing a whole RDF node, which in JSONLD happens by
@id. Nevertheless, using YAML Anchors and Aliases ensures referential integrity within the document (that the@idis not mistyped). - We should describe how Anchors and Aliases could mesh with JSON-LD Frames
One point where I believe YAML anchors can help are the description complex of contexts. E.g.
{
"@context": {
"xsd": "http://www.w3.org/2001/XMLSchema#",
"@vocab": "http://example.com/ns/Company/",
"founder": { "@context": {
"@vocab": "http://example.com/ns/Person/",
"birthDate": { "@type": "xsd:date" }
}},
"employee": { "@context": {
"@vocab": "http://example.com/ns/Person/",
"birthDate": { "@type": "xsd:date" }
}}
}
}
Notice that the scoped contexts of founder and employee are exactly the same (a "person" context). With Yaml anchors, this redundancy could be elimiinated.
NB: there are other means to get rid of this redundancy in pure JSON-LD:
- hosing the "person" context at a different URL and use that URL instead
- define a type-scoped context for a type
Person, and expect values offounderandemployeeto be explicitly typed
but they have their drawbacks that are not always acceptable.
That's exactly the kind of discussions and examples we need :)
"@context":
xsd: http://www.w3.org/2001/XMLSchema#
"@vocab": http://example.com/ns/Company/
founder:
"@context": &person-context
"@vocab": http://example.com/ns/Person/
birthDate:
"@type": xsd:date
employee:
"@context": *person-context
how Anchors and Aliases could mesh with JSON-LD Frames
Frames specify which nodes to expand, and which nodes to merely refer to by URI. So in some sense they tackle the "graph vs tree" problem.
Anchors and Aliases tackle the same problem; intuitively I feel in a more general way.
So: what can be the connection between them?
I am not entirely clear on how anchors would actually affect the LD part of the picture. Having a YAML document with anchors, we're going to convert it to JSON — and in that conversion, the anchors will be resolved. Thus, a JSON-LD processor that we will subsequently use won't know anything about those anchors.
This is similar to C preprocessor directives which are resolved before the source file is consumed by the compiler itself.
Is that right? If yes, can't we safely ignore these particular YAML features relying upon YAML spec to describe them?
Of course, JSON-LD does encode a graph in JSON; what used to be called a node reference is of the form {"@id": "..."}. Framing has an @embed keyword that can control how this works with one or all instances of a node referenced either fully or as a reference.
The YAML anchor/alias mechanism is similar the the framing keys, and also similar in concept to the @included keyword.
For now, I think we need to be cautions on depending on any YAML features beyond JSON re-serialization until we understand the requirements for round-tripping. a YAML-LD extended profile could allow us to move beyond what can easily be represented in JSON-LD, and we need to tread carefully.
Anchors can be used to define fragment IDs inside YAML instance data, like attributes @id and href/@name do in HTML.
@ioggstream where was your proposal for such fragments? In addition to anchors, it used JSON Path to address any element in the JSON/YAML structure.
Eg if at https://example.com/TheSimpsons.yaml we have:
*Bart:
name: Bart Simpsons
gender: male
Then the alias would be resolved to https://example.com/TheSimpsons.yaml#Bart
The same in plain YAML-LD would look like this:
- "@id": Bart
name: Bart Simpsons
gender: male
--
@anatoly-scherbakov basically says that anchors/aliases must be resolved by the YAML processor and elided, i.e. anchors can only be used locally inside one file. Furthermore, the shared info must be copied out during the resolution. I like @pchampin's concrete example of using aliases to express a context more economically. But being a graph person, I dislike expanding shared graph structures by copying them out.
--
If anchor-based data sharing is necessarily local (limited to one file), then perhaps we can use it at least for blank nodes and avoid copying? Eg
valve1:
temperature: *temp100C
value: 100
unit: degC
valve2:
temperature: &temp100C
Should result in this turtle
<valve1> :temperature _:temp100C.
<valve2> :temperature _:temp100C.
_:temp100C :value 100; :unit <degC>.
and NOT this one:
<valve1> :temperature [:value 100; :unit <degC>]
<valve2> :temperature [:value 100; :unit <degC>].
@VladimirAlexiev let me try to clarify your examples:
Syntax tweak. A keyword cannot start with *; Anchor is attached to a node.
Bart: &BartSimpsons # create an anchor to this node.
name: Bart Simpsons
gender: male
I don't think that this can implicitly map to a @id: Bart because Anchors are a serialization details. The above document can be legitimately be serialized as
Bart: &anchor001 # create an anchor to this node.
name: Bart Simpsons
gender: male
Homer:
children:
- *anchor001 # An Alias references an anchor.
Representation graph
iiuc the yaml below
t100: &t100 100
valve1:
temperature: &temp100C
value: *t100
unit: degC
valve2:
temperature: *temp100C
maps to the following YAML rep. graph
graph LR;
root --> t100 & valve1 & valve2
t100 --> 100
valve1 --> temperature1[temperature] -->temp100C --> value & unit
value --> t100
unit --> degC
valve2 --> temperature2[temperature] -->temp100C
The first question I asked myself is: how do pyyaml process this information?.
pyyaml preserves reference when parsing mutable structures to a dict()
temperature = yaml.safe_load(temperature_yaml) # see doc above
assert temperature['valve1']['temperature']['value'] == 100
assert temperature['valve2']['temperature']['value'] == 100
# assign a new temperature
temperature['valve1']['temperature']['value'] = 200
assert temperature['valve2']['temperature']['value'] == 200 # Changed.
but acting on an immutable structure, things changes
assert temperature["t100"] == 100
assert temperature['valve2']['temperature']['value'] == 100
temperature["t100"] = 200
assert temperature['valve2']['temperature']['value'] == 100 # Not changed.
Sharing and Cycles (Frames)
Frames are quite key because they define what part of an RDF graph and how to unroll it to a JSON tree.
@gkellogg in #44
The JSON-LD Framing algorithm is quite complicated as it is.
Agreed, and I don't even know it properly. Of course, we'll use it whole-cloth without modification.
But I intuitively feel that anchors may have something to do with Frames because both address (to some degree) the problem "given a graph, how to serialize part of it as a tree". Both allow to share objects and handle cycles (to avoid infinite embedding), but:
- JSON-LD can share RDF nodes and nothing else
- YAML-LD anchors can share finer-grain structures: node URLs, single literals, pieces of objects (similar to
@included)
Modularity/Structuring
@pchampin
anchors can help in the description of complex contexts
JSON Schema has special modularity/structuring facilities, see https://json-schema.org/understanding-json-schema/structuring.html
- JSON-LD doesn't have such advanced facilities, so JSON-LD contexts tend to be gigantic.
- in a specific project we've assembled https://github.com/gs1/EPCIS/blob/master/epcis-context.jsonld from a bunch of files in https://github.com/gs1/EPCIS/tree/master/JSON-LD-Context, leading to numerous bugs https://github.com/gs1/EPCIS/issues/307 (in particular see
Bug 6 duplication)
- in a specific project we've assembled https://github.com/gs1/EPCIS/blob/master/epcis-context.jsonld from a bunch of files in https://github.com/gs1/EPCIS/tree/master/JSON-LD-Context, leading to numerous bugs https://github.com/gs1/EPCIS/issues/307 (in particular see
- JSON-LD modularity is based on including contexts by URL
- But just like YAML anchors vs node URLs, Schema inclusion feels finer-granularity than context inclusion
- Schema even has $anchor that's very similar to YAML anchors (but not used as often as JSON Pointers and the "standard place"
$defs) If we adopt #54, we should think about "merging" JSON Schema anchors with YAML anchors
So the question of YAML fragments and pointers, and how they relate to Schema fragments and JSON Pointers, is key. @ioggstream has been struggling with this problem: please take charge of this, keep up the fight, and we'll help as much as we can!
Syntax tweak
Thanks!
Representation graph
Yes, but the alias "nodes" t100, temp100C are quite different from the others because they carry no info and instead are just redirection pointers (so maybe use a different color).
This issue was discussed on the Aug 03 meeting.