schema-automator icon indicating copy to clipboard operation
schema-automator copied to clipboard

Support rdfs:domain and rdfs:range in generated schema by `import-rdfs`

Open jo-fra opened this issue 11 months ago • 9 comments

When generating a LinkML schema using schemauto import-rdfs the resulting LinkML schema does not incorporate the rdfs:domain and rdfs:range definitions.

E.g. the following excerpt from FOAF:

###  http://xmlns.com/foaf/0.1/knows
foaf:knows rdf:type owl:ObjectProperty ;
           rdfs:domain foaf:Person ;
           rdfs:range foaf:Person ;
           rdfs:comment "A person known by this person (indicating some level of reciprocated interaction between the parties)." ;
           rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> ;
           rdfs:label "knows" .

###  http://xmlns.com/foaf/0.1/Person
foaf:Person rdf:type owl:Class ;
            rdfs:subClassOf <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> ,
                            foaf:Agent ;
            owl:disjointWith foaf:Project ;
            rdfs:comment "A person." ;
            rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> ;
            rdfs:label "Person" .

results in the the following LinkML schema:

slots:
  knows:
    comments:
    - A person known by this person (indicating some level of reciprocated interaction
      between the parties).
    slot_uri: foaf:knows

classes:
  Person:
    comments:
    - A person.
    is_a: Agent
    class_uri: foaf:Person

I would expect that rdfs:domain and rdfs:range of foaf:knows property is incorporated like:

slots:
  knows:
    comments:
    - A person known by this person (indicating some level of reciprocated interaction
      between the parties).
    slot_uri: foaf:knows
    range: Person

classes:
  Person:
    comments:
    - A person.
    is_a: Agent
    class_uri: foaf:Person
    slots: 
    - knows

jo-fra avatar Jan 07 '25 15:01 jo-fra

I believe I've fixed this in #152. Can you please test that branch to let me know if it resolves your issue?

multimeric avatar Jan 15 '25 02:01 multimeric

@multimeric I tested it with the latest commit (cfe7e15563ee1cf681b3eb266eb9ab8754f0c39c) of https://github.com/linkml/schema-automator/pull/152 and with that I do not get the expected results. However, looking at the diff it shows that you reverted some changes in schema_automator/importers/rdfs_import_engine.py when merging from https://github.com/linkml/schema-automator/pull/151, e.g.:

https://github.com/linkml/schema-automator/pull/152/commits/cfe7e15563ee1cf681b3eb266eb9ab8754f0c39c#diff-b6464c40227100611b000caf1086c351935db40bcf0f78ca606ca68d94e5ca3fL37-L43:

<<<<<<< HEAD
    "domain_of": [HTTP_SDO.domainIncludes, SDO.domainIncludes, RDFS.domain],
    "range": [HTTP_SDO.rangeIncludes, SDO.rangeIncludes, RDFS.range],
=======
    "domain_of": [HTTP_SDO.domainIncludes, SDO.domainIncludes],
    "rangeIncludes": [HTTP_SDO.rangeIncludes, SDO.rangeIncludes],
>>>>>>> cleanup-deps

After testing it with commit 65869ba4d7739302e352ac02a8354a5746e89f34 before the merge rdfs:domain and rdfs:range are incorporated expected!

Was the revert of that changes unintended?

Just two caveats:

  1. the generated schema has default_prefix: example but it is not defined in prefixes:

    prefixes:
      linkml: https://w3id.org/linkml/
      dc: http://purl.org/dc/elements/1.1/
      vs: http://www.w3.org/2003/06/sw-vocab-status/ns#
      owl: http://www.w3.org/2002/07/owl#
      wot: http://xmlns.com/wot/0.1/
      foaf: http://xmlns.com/foaf/0.1/
      rdfs: http://www.w3.org/2000/01/rdf-schema#
    default_prefix: example
    
  2. All datatype properties with rdfs:range rdfs:Literal are generated with range: Literal e.g.

    slots:
      jabberID:
        comments:
        - A jabber ID for something.
        slot_uri: foaf:jabberID
        range: Literal
    

    However Literal is unrecognized and I am getting this error when trying to run gen-python with this schema:

    gen-python foaf_schema.yaml
    ValueError: File "foaf_schema.yaml", line 21, col 12 slot: jabberID - unrecognized range (Literal)
    

jo-fra avatar Jan 24 '25 09:01 jo-fra

Thanks for the report. It probably was just a faulty merge. I'll likely fix it early next week.

multimeric avatar Jan 24 '25 10:01 multimeric

Okay, I've rebased and hopefully fixed the underlying issue.

multimeric avatar Jan 31 '25 05:01 multimeric

@multimeric Thanks, I tried it with the latest commit and it includes now rdfs:domain and rdfs:range.

Only this two issues still persist:

  1. the generated schema has default_prefix: example but it is not defined in prefixes:

    prefixes:
      linkml: https://w3id.org/linkml/
      dc: http://purl.org/dc/elements/1.1/
      vs: http://www.w3.org/2003/06/sw-vocab-status/ns#
      owl: http://www.w3.org/2002/07/owl#
      wot: http://xmlns.com/wot/0.1/
      foaf: http://xmlns.com/foaf/0.1/
      rdfs: http://www.w3.org/2000/01/rdf-schema#
    default_prefix: example
    
  2. All datatype properties with rdfs:range rdfs:Literal are generated with range: Literal e.g.

    slots:
      jabberID:
        comments:
        - A jabber ID for something.
        slot_uri: foaf:jabberID
        range: Literal
    

    However Literal is unrecognized and I am getting this error when trying to run gen-python with this schema:

    gen-python foaf_schema.yaml
    ValueError: File "foaf_schema.yaml", line 21, col 12 slot: jabberID - unrecognized range (Literal)
    

jo-fra avatar Jan 31 '25 08:01 jo-fra

Hmm, I can't replicate this Literal issue. If I schemauto import-rdfs on the following ttl:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

foaf:knows rdf:type owl:ObjectProperty ;
           rdfs:domain foaf:Person ;
           rdfs:range foaf:Person ;
           rdfs:comment "A person known by this person (indicating some level of reciprocated interaction between the parties)." ;
           rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> ;
           rdfs:label "knows" .

foaf:Person rdf:type owl:Class ;
            rdfs:subClassOf <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> ,
                            foaf:Agent ;
            owl:disjointWith foaf:Project ;
            rdfs:comment "A person." ;
            rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> ;
            rdfs:label "Person" .

I get:

name: example
id: http://example.org/example
imports:
- linkml:types
prefixes:
  linkml: https://w3id.org/linkml/
  foaf: http://xmlns.com/foaf/0.1/
default_prefix: example
default_range: string
slots:
  knows:
    comments:
    - A person known by this person (indicating some level of reciprocated interaction
      between the parties).
    slot_uri: foaf:knows
    range: Person
classes:
  Agent:
    class_uri: foaf:Agent
  SpatialThing:
    class_uri: ns1:SpatialThing
  Person:
    comments:
    - A person.
    is_a: Agent
    slots:
    - knows
    class_uri: foaf:Person

multimeric avatar Jan 31 '25 11:01 multimeric

You're right that the default prefix is messed up, and I think I need some input from the maintainers on what to do about that, but to be honest you should always pass in a name and model_uri. The schema won't make much sense otherwise. So for foaf you would do something like:

poetry run schemauto import-rdfs --format xml http://xmlns.com/foaf/spec/index.rdf --schema-name foaf --model-uri http://xmlns.com/foaf/0.1/`

multimeric avatar Jan 31 '25 12:01 multimeric

@multimeric @jo-fra - agree the default 'example' is confusing and we may just want to get rid of that in the automated step. But agree with @multimeric that passing in an actual value here is a great standard practice. We tend to think of schema-automator as a bootstrapping tool, that users will interact with to get them most of the way towards a working schema, but that they will have to edit to add finishing touches. Schema-automator is getting so much better with these fixes; thank you!

sierra-moxon avatar Jan 31 '25 16:01 sierra-moxon

Okay I've just pushed a new change. Firstly, it removes the custom example default in the RDFS importer in favour of letting the SchemaBuilder handle it. Secondly, it tries to infer the schema metadata from RDF. Basically if the name is not provided explicitly, the most common prefix it finds becomes the name. If the id is not explicitly provided, then the corresponding URI becomes the ID. So for FOAF it would determine that foaf is used a ton in the document and therefore schema.name = "foaf" and schema.id = http://xmlns.com/foaf/0.1/.

multimeric avatar Feb 03 '25 01:02 multimeric