pronto icon indicating copy to clipboard operation
pronto copied to clipboard

Potential bug | ValueError: identifier already in use when loading MONDO ontology

Open Odessit007 opened this issue 1 year ago • 1 comments

Hi. Thank for the great package. Unfortunately, I encountered an issue while trying to load MONDO ontology in RDF/XML format using

import pronto
o = pronto.Ontology.from_obo_library('mondo.owl')

I'm getting the following error:

ValueError: identifier already in use: PATO:0000051 (Relationship('PATO:0000051'))

A full traceback is

ValueError                                Traceback (most recent call last)
<timed exec> in <module>

[/usr/local/lib/python3.10/dist-packages/pronto/ontology.py](https://localhost:8080/#) in from_obo_library(cls, slug, import_depth, timeout, threads)
    204 
    205         """
--> 206         return cls(
    207             f"[http://purl.obolibrary.org/obo/{slug}](http://purl.obolibrary.org/obo/%7Bslug%7D)", import_depth, timeout, threads
    208         )

4 frames
[/usr/local/lib/python3.10/dist-packages/pronto/ontology.py](https://localhost:8080/#) in create_term(self, id)
    482         """
    483         if id in self:
--> 484             raise ValueError(f"identifier already in use: {id} ({self[id]})")
    485         self._terms.entities[id] = termdata = TermData(id)
    486         self._terms.lineage[id] = Lineage()

ValueError: identifier already in use: PATO:0000051 (Relationship('PATO:0000051'))

MONDO is quite a reputable source and the same file is being parsed just fine with rdflib and protégé so I wonder if it might be a bug in pronto?

*** EDIT ***

Diving deeper into the OWL file, I noticed that this entity (PATO:0000051) is both a class and a data property. Running a simple SPARQL query on the rdflib parsed graph showed that there are only two such entities in MONDO OWL file: PATO:0000051 and PATO:0000070.

As this felt counter-intuitive to me, I glanced over the OWL docs on w3.org (https://www.w3.org/TR/2012/REC-owl2-primer-20121211/) and it seems that this is a legitimate situation. Citing from the docs:

However, an IRI may denote different entity-types (e.g. both an individual and a class) at the same time. This possibility, called “punning,” has been introduced to allow for a certain amount of metamodeling; we give an example of this in [Section 9](https://www.w3.org/TR/2012/REC-owl2-primer-20121211/#OWL_2_DL_and_OWL_2_Full). Still, OWL 2 does require some discipline in using and reusing names. To allow a more readable syntax, and for other technical reasons, OWL 2 DL requires that a name is not used for more than one property type (object, datatype or annotation property) nor can an IRI denote both a class and a datatype.

While being both a class and a datatype is prohibited, nothing here indicates that an entity can't be both a class and a property (A disclaimer: I'm not an OWL expert and I didn't read the full specification so it's just my attempt to investigate the issue).

It seems that pronto prohibits such IRI reuse because in rdfxml.py (as of pronto version 2.5.5) there is this piece of code:

            for prop in tree.iterfind(_NS["owl"]["ObjectProperty"]):
                self._extract_object_property(prop, curies)
            for prop in tree.iterfind(_NS["owl"]["AnnotationProperty"]):
                self._extract_annotation_property(prop, curies)
            for class_ in tree.iterfind(_NS["owl"]["Class"]):
                self._extract_term(class_, curies)
            for axiom in tree.iterfind(_NS["owl"]["Axiom"]):
                self._process_axiom(axiom, curies)

which (as far as I understand) pins an IRI once it was met as an object property/annotatin property/class/axiom and raises the above-mentioned error if it's reused.

Odessit007 avatar Nov 18 '23 23:11 Odessit007

Hi, is there any update on this or a workaround?

Edit: it seems if you download to pronto==2.5.2 it works all ok

MatthewCorney avatar Feb 22 '24 13:02 MatthewCorney