pyld
pyld copied to clipboard
Dates from schema.org are not compacted correctly.
I've traced this down to https://github.com/digitalbazaar/pyld/blob/master/lib/pyld/jsonld.py#L4132
The original symptoms that I observed were that various Date
fields that should be compacted to keys like birthDate
were actually including a schema:
prefix.
It looks like the core problem is that the default inverse dictionary for @language
is not getting populated correctly (perhaps due to a change in the published @context
from schema.org?)
Date field specifications include both @id
and @type
, but not language, and are still expected to be pure strings (according to schema.org's documentation). When the compactor attempts to find the correct term, [_select_term]()
returns None.
The inverse dictionary winds up looking something like this:
{
'http://schema.org/birthDate': {'@none': {'@language': {},
'@type': {'http://schema.org/Date': 'birthDate'}}},
# etc
}
Instead of containing a {'@none': 'birthDate'}
for @language
, there's just an empty dict.
Some other non-date fields also seem to exhibit this issue, but I don't know enough about the library or json-ld to know if these symptoms are actually problems, or if they're by design.
Minimum-reproducible sample:
#!/usr/bin/env python
from pyld import jsonld
doc = {
'http://schema.org/name': 'Buster the Cat',
'http://schema.org/birthDate': '2012',
'http://schema.org/deathDate': '2015-02-25'
}
frame = {
'@context': 'http://schema.org/'
}
framed = jsonld.frame(doc, frame)
contents = framed['@graph'][0]
print(framed)
assert 'name' in contents # fine
assert 'birthDate' in contents # not fine, schema:birthDate instead
assert 'deathDate' in contents # not fine, schema:deathDate instead
My proposal to fix this would be to apply https://github.com/Artory/pyld/commit/faaa1394dc32ecfdd6fd875f45e4fdbc931fa7cc, to attempt to set these defaults regardless of the outcome of the conditionals there.
I've run into this problem too. Unfortunately the patch breaks the tests, most of the failed tests are ordering differences and SSL errors, but some of them aren't. I'm not familiar enough with JSON-LD to know what is wrong with them though.
@alantrick The tests have many failures now because the test suite and specs moved forward and the code hasn't caught up. Unless someone jumps in on this the code likely won't be updated until after the js library has also caught up.
As to the original problem, it can also be tested with just compaction. Here's the example with JSON quoting for easier playground use:
{
"http://schema.org/name": "Buster the Cat",
"http://schema.org/birthDate": "2012",
"http://schema.org/deathDate": "2015-02-25"
}
{
"@context": "http://schema.org/"
}
Note from the schema.org context, the lines:
"@vocab": "http://schema.org/",
"schema": "http://schema.org/",
"Date": {"@id": "schema:Date"},
"birthDate": { "@id": "schema:birthDate", "@type": "Date"},
"deathDate": { "@id": "schema:deathDate", "@type": "Date"},
"name": { "@id": "schema:name"},
The test input just has simple strings for the date values. If you change the input so the data has the expanded Date
type, then the compaction will work (you can test this all with just compaction vs framing):
{
"http://schema.org/name": "Buster the Cat",
"http://schema.org/birthDate": {"@value": "2012", "@type": "http://schema.org/Date"},
"http://schema.org/deathDate": {"@value": "2015-02-25", "@type": "http://schema.org/Date"}
}
Output:
{
"@context": "http://schema.org/",
"birthDate": "2012",
"deathDate": "2015-02-25",
"name": "Buster the Cat"
}
Schema.org does say to use a string ISO 8601 date, but in JSON-LD it still needs to be typed. I don't recall the reason it's matching on types to do the term compaction. I think that's how it's supposed to work but I'm unsure. @gkellogg Perhaps you can confirm that if you have a moment?
Here's a self contained simple compaction test: Input:
{
"http://example.org/a": "A",
"http://example.org/b": "B",
"http://example.org/c": {"@value": "C", "@type": "urn:C"}
}
Context:
{
"@context": {
"ex": "http://example.org/",
"a": {"@id": "http://example.org/a"},
"b": {"@id": "http://example.org/b", "@type": "urn:B"},
"c": {"@id": "http://example.org/c", "@type": "urn:C"}
}
}
Output:
{
"@context": {
"ex": "http://example.org/",
"a": {
"@id": "http://example.org/a"
},
"b": {
"@id": "http://example.org/b",
"@type": "urn:B"
},
"c": {
"@id": "http://example.org/c",
"@type": "urn:C"
}
},
"a": "A",
"ex:b": "B",
"c": "C"
}
I don't recall the reason it's matching on types to do the term compaction. I think that's how it's supposed to work but I'm unsure.
When compacting, it's necessary to be sure that the value matches the type, which would include both @type
and @language
. Otherwise, if it compacted to that term and used a string value, it would not expand back to it's proper value.
ISO 8601 does allow a greater range of date styles than does xsd:date, which is fine, and nothing actually checks the value for conformance, but if the term days it's a schema:Date
, then "2018" will expand to {"@value": "2018", "@type": "http://schema.org/Date"}
, so it must be of that form to be re-compacted using the same term.