ChEBI icon indicating copy to clipboard operation
ChEBI copied to clipboard

Improve Header and Metadata of OWL dumps

Open jmkeil opened this issue 11 months ago • 5 comments

The header and metadata of the OWL dumps could be improved.

Here a current header example:

<rdf:RDF xmlns="http://purl.obolibrary.org/obo/chebi.owl#"
     xml:base="http://purl.obolibrary.org/obo/chebi.owl"
     xmlns:chebi1="http://purl.obolibrary.org/obo/chebi#3"
     xmlns:chebi2="http://purl.obolibrary.org/obo/chebi#"
     xmlns:chebi3="http://purl.obolibrary.org/obo/chebi#1"
     xmlns:chebi="http://purl.obolibrary.org/obo/chebi#2"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:obo="http://purl.obolibrary.org/obo/">
    <owl:Ontology rdf:about="http://purl.obolibrary.org/obo/chebi.owl">
        <owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/chebi/225/chebi.owl"/>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ChEBI subsumes and replaces the Chemical Ontology first</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Author: ChEBI curation team</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">developed by Michael Ashburner &amp; Pankaj Jaiswal.</rdfs:comment>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ChEBI Release version 225</rdfs:comment>
        <oboInOwl:saved-by rdf:datatype="http://www.w3.org/2001/XMLSchema#string">chebi</oboInOwl:saved-by>
        <oboInOwl:date rdf:datatype="http://www.w3.org/2001/XMLSchema#string">27:08:2023 19:12</oboInOwl:date>
        <oboInOwl:hasOBOFormatVersion rdf:datatype="http://www.w3.org/2001/XMLSchema#string">1.2</oboInOwl:hasOBOFormatVersion>
        <oboInOwl:default-namespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">chebi_ontology</oboInOwl:default-namespace>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">For any queries contact [email protected]</rdfs:comment>
    </owl:Ontology>

Some thoughts what could be improved:

  • cleanup namespaces
    • let prefix chebi refere to namespace http://purl.obolibrary.org/obo/chebi#
    • remove some unnecessary namespaces (chebi1, chebi2, chebi3, chebi4, xml, xsd), as they will survive further processing and will then be used for e.g. http://purl.obolibrary.org/obo/chebi#3_STAR
  • owl:versionIRI should incorporate LITE / CORE / FULL
  • owl:versionInfo, dcterms:creator, schema:email/foaf:mbox should be used instead of rdfs:comment (see Metadata usage in WIDOCO)
  • add dc:license for license information
  • maybe add schema:discussionUrl

jmkeil avatar Sep 04 '23 11:09 jmkeil

Thanks @jmkeil!

Additionally:

  • Don't split sentences across different rdfs:comment statements, these are not ordered (I am guessing no one has looked at the header in ~20 years, there are probably better ways to credit the original work by Michael and @jaiswalp)

I would offer different advice from @jmkeil though, CHEBI should be consistent with OBO Metadata standards and OMO (https://obofoundry.org/ontology/omo)

  • I think versionIRI should be kept as is. But if CHEBI were to make different subsets for LITE/CORE these would have their own URIs and would be versioned with distinct IRIs
  • use dcterms:contributor over dcterms:creator, using orcid https URIs (but if there is not an attempt to populate this consistently it should be left out)
  • use dcterms:license (I am surprised there is not an open issue about this already, cc @matentzn)

@jmkeil you may want to petition OMO to include schema:discussionUrl or schema:email, see https://github.com/information-artifact-ontology/ontology-metadata/issues

cmungall avatar Sep 05 '23 14:09 cmungall

Hi all, many points of the above are already resolved in our current development for ChEBI 2.0 here.

  1. We let only one chebi prefix along all the ontology: prefix http://purl.obolibrary.org/obo/chebi/ (not #)
  2. The extra namespaces chebi1, chebi2, chebi3, chebi4 have been quite difficult to delete, we are using robot to generate the ontology right now (which is a process completely different than before, in the past was used an OBO file and robot just converted it to OWL) and, we suspect that because the resource starts with a number (i.e. 1_STAR, 2_STAR, 3_STAR) robot does not understand and the generated owl file has those extra namespaces. They are not used in any place in the ontology.
  3. RDF comments were unified.
  4. As mentioned @cmungall, versionIRI keeps it as it is.
  5. dcterms:license was added and dcterms:creator as well. But for sure, we are going to check if it is possible to use dcterms:contributor.
  6. We are using a number to tag a version, but I realise that it is even better to use the release' current date in the IRI, we'll re-check this as well.
  7. Other things like xmls types and foaf:homepage was added

@cmungall we are planning to have LITE and CORE variants as well, as you can see on the FTP link, so yes, we would need to generate other ontology iris, I guess we would need to include in the PURL repository??

Last but not least, @jmkeil and @cmungall you guys can start to play with the new ontology, just taking into account it is in the development phase and it is not official.

CarMoreno avatar Sep 05 '23 15:09 CarMoreno

Hi all, many points of the above are already resolved in our current development for ChEBI 2.0 here.

Awesome!

we suspect that because the resource starts with a number (i.e. 1_STAR, 2_STAR, 3_STAR)

Why not make new PURLs for your subsets?

We are using a number to tag a version, but I realise that it is even better to use the release' current date in the IRI, we'll re-check this as well.

It's more conventional in OBO to use ISO-8601-based versionIRIs (please don't invent a different way!) but bear in mind that you'll need to support the old versionIRIs and you can't retroactively give date based versionIRIs for these.

https://obofoundry.org/principles/fp-004-versioning.html

we are planning to have LITE and CORE variants as well, as you can see on the FTP link, so yes, we would need to generate other ontology iris, I guess we would need to include in the PURL repository??

You can list different products in your OBO metadata entry. Each product has its own PURL. See for example CL https://obofoundry.org/ontology/cl which has a common pattern of providing the full ontology plus basic

cmungall avatar Sep 05 '23 22:09 cmungall

Sorry for the delay.

I would offer different advice from @jmkeil though, CHEBI should be consistent with OBO Metadata standards and OMO (https://obofoundry.org/ontology/omo)

Agree. It should adhere to the community's standards.

many points of the above are already resolved in our current development for ChEBI 2.0 here.

Great.

Are there plans to switch to HTTP instead of FTP for the ontology download? Using HTTP transport encoding for transparent compression (i.e. requesting chebi.owl, but only transferring chebi.owl.gz without further user action) would be a real relief.

jmkeil avatar Sep 15 '23 13:09 jmkeil

I just want to note that changing some of the header may mean changing some of the IRI's of concepts used in ChEBI owl today. This would be a breaking change for a number of tools, and as such I encourage the ChEBI team to announce these changes well ahead of time. e.g. a SPARQL query or a parser expecting chebi#is_conjugate_base_of would need to be rewritten to chebi/is_conjugate_base_of.

By the way to be clear I want these changes, just with some advance notice ;)

For rhea/swisslipids/uniprot we can do on the fly query rewriting to change chebi# into chebi/ but other users might not have that capability.

JervenBolleman avatar Jan 24 '24 13:01 JervenBolleman