pronto
pronto copied to clipboard
Potential bug | ValueError: identifier already in use when loading MONDO ontology
Hi. Thank for the great package. Unfortunately, I encountered an issue while trying to load MONDO ontology in RDF/XML format using
import pronto
o = pronto.Ontology.from_obo_library('mondo.owl')
I'm getting the following error:
ValueError: identifier already in use: PATO:0000051 (Relationship('PATO:0000051'))
A full traceback is
ValueError Traceback (most recent call last)
<timed exec> in <module>
[/usr/local/lib/python3.10/dist-packages/pronto/ontology.py](https://localhost:8080/#) in from_obo_library(cls, slug, import_depth, timeout, threads)
204
205 """
--> 206 return cls(
207 f"[http://purl.obolibrary.org/obo/{slug}](http://purl.obolibrary.org/obo/%7Bslug%7D)", import_depth, timeout, threads
208 )
4 frames
[/usr/local/lib/python3.10/dist-packages/pronto/ontology.py](https://localhost:8080/#) in create_term(self, id)
482 """
483 if id in self:
--> 484 raise ValueError(f"identifier already in use: {id} ({self[id]})")
485 self._terms.entities[id] = termdata = TermData(id)
486 self._terms.lineage[id] = Lineage()
ValueError: identifier already in use: PATO:0000051 (Relationship('PATO:0000051'))
MONDO is quite a reputable source and the same file is being parsed just fine with rdflib
and protégé
so I wonder if it might be a bug in pronto
?
*** EDIT ***
Diving deeper into the OWL file, I noticed that this entity (PATO:0000051) is both a class and a data property. Running a simple SPARQL query on the rdflib
parsed graph showed that there are only two such entities in MONDO OWL file: PATO:0000051 and PATO:0000070.
As this felt counter-intuitive to me, I glanced over the OWL docs on w3.org (https://www.w3.org/TR/2012/REC-owl2-primer-20121211/) and it seems that this is a legitimate situation. Citing from the docs:
However, an IRI may denote different entity-types (e.g. both an individual and a class) at the same time. This possibility, called “punning,” has been introduced to allow for a certain amount of metamodeling; we give an example of this in [Section 9](https://www.w3.org/TR/2012/REC-owl2-primer-20121211/#OWL_2_DL_and_OWL_2_Full). Still, OWL 2 does require some discipline in using and reusing names. To allow a more readable syntax, and for other technical reasons, OWL 2 DL requires that a name is not used for more than one property type (object, datatype or annotation property) nor can an IRI denote both a class and a datatype.
While being both a class and a datatype is prohibited, nothing here indicates that an entity can't be both a class and a property (A disclaimer: I'm not an OWL expert and I didn't read the full specification so it's just my attempt to investigate the issue).
It seems that pronto
prohibits such IRI reuse because in rdfxml.py
(as of pronto version 2.5.5) there is this piece of code:
for prop in tree.iterfind(_NS["owl"]["ObjectProperty"]):
self._extract_object_property(prop, curies)
for prop in tree.iterfind(_NS["owl"]["AnnotationProperty"]):
self._extract_annotation_property(prop, curies)
for class_ in tree.iterfind(_NS["owl"]["Class"]):
self._extract_term(class_, curies)
for axiom in tree.iterfind(_NS["owl"]["Axiom"]):
self._process_axiom(axiom, curies)
which (as far as I understand) pins an IRI once it was met as an object property/annotatin property/class/axiom and raises the above-mentioned error if it's reused.
Hi, is there any update on this or a workaround?
Edit: it seems if you download to pronto==2.5.2 it works all ok