pyLODE
pyLODE copied to clipboard
Handle SDO-style schemas
schema.org doesn't use rdfs:domain and range. Some other important ontologies have adopted this style. eg SSN, SOSA (according to @dr-shorthair), WOT TD. Our euBusinessGraph SWJ paper describes the benefits of such lighter way approach.
Can you support this style in PyLODE? See https://github.com/essepuntato/LODE/issues/12 for details, and https://github.com/essepuntato/LODE/issues/13 for more details, and an implementation as a patch to LODE's XSL (doing this patching made me appreciate you decided to rewrite without using XSL).
You already have included two examples to test it: SSN, SOSA (I currently don't see any domain/range). Also test on schema.org, as it's pretty big.
https://github.com/RDFLib/pyLODE#annotations says that SDO props are supported:
- domains - rdfs:domain or schema:domainIncludes
- ranges - rdfs:range or schema:rangeIncludes
(as well as probably all the additional annotation props from my patch above).
But there's a bug then, compare:
- https://github.com/RDFLib/pyLODE/blob/master/pylode/examples/sosa.ttl#L87
- http://rawgit2.com/RDFLib/pyLODE/master/pylode/examples/sosa.html#hassample
Note: it turns out that SSN doesn't use SDO constructs, SOSA does.
pylode bails on SDO:
pylode -u https://schema.org/version/latest/schemaorg-current-http.ttl -c true -o schema.html
"Your RDF file does not define an ontology"
- Unquestionably, that's not a good practice: https://github.com/schemaorg/schemaorg/issues/2831
- But knowing how slow SDO have become in fixing things in the last 1.5 years, can pylode be more forgiving and work without
owl:Ontologyandrdfs:isDefinedBy?
After removing all unicode chars:
time pylode -i schema-with-added-ontology.ttl -c true -o schema.html
Finished. ontdoc documentation in schema.html
real 1m39.494s <<< but it's a big ontology (970k ttl)
user 0m0.015s
sys 0m0.094s
It looks about ok, with the following fixes needed:
- handle
schema:domainIncludes, rangeIncludes - treat classes that are also
schema:DataTypeas datatypes:
schema:Boolean a schema:DataType, rdfs:Class ;
- handle dct:source, eg
schema:Brand a rdfs:Class ;
dct:source <http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_GoodRelationsClass> ;
- handle embedded HTML better. eg
A BreadcrumbList is an ItemList consisting of a chain of linked Web pages,
typically described using at least their URL and their name, and typically ending with the current page.<br/><br/>
The <a class="localLink" href="http://schema.org/position">position</a> property
is used to reconstruct the order of the items in a BreadcrumbList
The convention is that a breadcrumb list has an <a class="localLink" href="http://schema.org/itemListOrder">itemListOrder</a> of
<a class="localLink" href="http://schema.org/ItemListOrderAscending">ItemListOrderAscending</a>
is rendered as
A BreadcrumbList is an ItemList consisting of a chain of linked Web pages,
typically described using at least their URL and their name, and typically ending with the current page.\n\n
The [[position]] property is used to reconstruct the order of the items in a BreadcrumbList
The convention is that a breadcrumb list has an [[itemListOrder]] of [[ItemListOrderAscending]]
In other words it parses the HTML and turns it to internal markdown, but then the markdown is not translated to corresponding html
- handle internal links: eg
[[itemListOrder]]should become<a href='#itemListOrder'>itemListOrder</a>.- Note: this HTML link is handled correctly:
See also the <a href="/docs/hotels.html">dedicated document on the use of schema.org for marking up hotels and other forms of accommodations</a> - I think it's also better to have pylode-generated links to point to internal anchors, rather than to the semantic URL (eg https://schema.org/itemListOrder), because this means the links are broken until the file is published officially.
- Note: this HTML link is handled correctly:
- handle markdown links, eg
This corresponds to the [YearBuilt field in RESO](https://ddwiki.reso.org/display/DDW17/YearBuilt+Field)is rendered as the same plain text, rather than generating a HTML link- Note: this markdown link is handled correctly:
(Source: Wikipedia see [https://en.wikipedia.org/wiki/Campsite](https://en.wikipedia.org/wiki/Campsite)). Here the name and the link are the same...
- Note: this markdown link is handled correctly:
PS: Let me know if you want schema-with-removed-UTF8.ttl (the fixed input) and schema.html (the output)
I'm looking in to this Issue now
Compare these examples:
- https://rawcdn.githack.com/euBusinessGraph/eubg-data/master/ontology/_old/index.html made with my patched LODE (which is available at https://github.com/euBusinessGraph/eubg-data/tree/master/ontology/_old, mostly extraction.xsl)
- https://rawcdn.githack.com/euBusinessGraph/eubg-data/master/ontology/doc.html made with PyLODE (made 14m ago)
- source: https://raw.githubusercontent.com/euBusinessGraph/eubg-data/master/model/ebg-ontology.ttl
@VladimirAlexiev can you try the latest versions of pyLODE (v2.9.x) for this task? It should correctly handle domainIncludes and rangeIncludes. It won't handle a missing owl:Ontology declaration, so I think your best bet is just to add statements like that to the data before sending it to pyLODE. pyLODE does do some ontology building to cater for various property options, like different forms of class labels/descriptions, but I'm not keen to cater for no owl:Ontology as lots of things (i.e. all the metadata) are dependent on this Ontology declaration, so I'd rather a user specifically set the ontology in a pre-pyLODE step.
https://github.com/eccenca/jod/issues/15 asks the same props to be handled in 3 namespaces: schema, dcam, dcid