ontology-access-kit
ontology-access-kit copied to clipboard
adapter.entity_metadata_map("HP:0000001") with obo adapter causes `DuplicateURIPrefixes` error
Version: oaklib 0.5.25
Replicates with both pronto and simpleobo adapters
Minimal test
from oaklib import get_adapter
example = """
format-version: 1.2
data-version: hp/releases/2024-02-25
ontology: hp.obo
[Term]
id: HP:0000001
name: All
"""
file_path = "example.obo"
# Open the file in write mode ('w'). This will create the file if it does not exist
# or overwrite it if it does.
with open(file_path, 'w') as file:
# Write the string to the file
file.write(example)
adapter = get_adapter("simpleobo:example.obo")
m = adapter.entity_metadata_map("HP:0000001")
Error: DuplicateURIPrefixes
DuplicateURIPrefixes Traceback (most recent call last)
Cell In[16], [line 22](vscode-notebook-cell:?execution_count=16&line=22)
[19](vscode-notebook-cell:?execution_count=16&line=19) file.write(example)
[21](vscode-notebook-cell:?execution_count=16&line=21) adapter = get_adapter("simpleobo:example.obo")
---> [22](vscode-notebook-cell:?execution_count=16&line=22) m = adapter.entity_metadata_map("HP:0000001")
File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620), in SimpleOboImplementation.entity_metadata_map(self, curie)
[618](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:618) m[DEPRECATED_PREDICATE].append(True)
[619](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:619) m[HAS_OBSOLESCENCE_REASON].append(TERMS_MERGED)
--> [620](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620) self.add_missing_property_values(curie, m)
[621](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:621) return dict(m)
File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460), in BasicOntologyInterface.add_missing_property_values(self, curie, metadata_map)
[1458](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1458) if PREFIX_PREDICATE not in metadata_map:
[1459](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1459) metadata_map[PREFIX_PREDICATE] = [prefix]
-> [1460](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460) uri = self.curie_to_uri(curie, False)
[1461](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1461) if uri:
[1462](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1462) if URL_PREDICATE not in metadata_map:
File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240), in BasicOntologyInterface.curie_to_uri(self, curie, strict)
[238](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:238) raise ValueError(f"Invalid CURIE: {curie}")
[239](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:239) return None
--> [240](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240) rv = self.converter.expand(curie)
[241](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:241) if rv is None and strict:
...
http://www.geneontology.org/formats/oboInOwl#:
prefix='oio' uri_prefix='http://www.geneontology.org/formats/oboInOwl#' prefix_synonyms=[] uri_prefix_synonyms=[] pattern=None
prefix='oboInOwl' uri_prefix='http://www.geneontology.org/formats/oboInOwl#' prefix_synonyms=[] uri_prefix_synonyms=[] pattern=None
You might need to share more details about your environment, because I cannot replicate here.
Tried in a clean virtualenv with the latest oaklib 0.5.25
, it works just fine.
Also tried with a clean virtualenv setup with babelon 0.2.4
(in case the problem came from a Babelon-specific dependency), same: no errors at all.
Full list of packages with their version:
Package Version
-------------------------- ---------------
airium 0.2.6
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.3.0
appdirs 1.4.4
arrow 1.3.0
attrs 23.2.0
Babel 2.14.0
babelon 0.2.4
bcp47 0.0.4
beautifulsoup4 4.12.3
cattrs 23.2.3
certifi 2024.2.2
CFGraph 0.2.1
chardet 5.2.0
charset-normalizer 3.3.2
class_resolver 0.4.3
click 8.1.7
click-default-group 1.2.4
colorama 0.4.6
curies 0.7.7
Deprecated 1.2.14
deprecation 2.1.0
distro 1.9.0
EditorConfig 0.12.4
et-xmlfile 1.1.0
eutils 0.6.0
fastobo 0.12.3
fqdn 1.5.1
funowl 0.2.3
ghp-import 2.1.0
graphviz 0.20.1
h11 0.14.0
hbreader 0.9.1
httpcore 1.0.4
httpx 0.27.0
idna 3.6
ijson 3.2.3
importlib-metadata 7.0.1
importlib_resources 6.1.2
iniconfig 2.0.0
isodate 0.6.1
isoduration 20.11.0
Jinja2 3.1.3
jsbeautifier 1.15.1
json-flattener 0.1.9
jsonasobj 1.3.1
jsonasobj2 1.0.4
jsonpatch 1.33
jsonpath-ng 1.6.1
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kgcl-rdflib 0.5.0
kgcl_schema 0.6.4
lark 1.1.9
linkml 1.7.5
linkml-dataops 0.1.0
linkml-renderer 0.3.0
linkml-runtime 1.7.2
llm 0.13.1
lxml 5.1.0
Markdown 3.5.2
MarkupSafe 2.1.5
mergedeep 1.3.4
mkdocs 1.5.3
mkdocs-material 9.5.11
mkdocs-material-extensions 1.3.1
mkdocs-mermaid2-plugin 0.6.0
more-click 0.1.2
ndex2 3.8.0
networkx 3.2.1
numpy 1.26.4
oaklib 0.5.25
ols-client 0.1.4
ontoportal-client 0.0.4
openai 1.12.0
openpyxl 3.1.2
packaging 23.2
paginate 0.5.6
pandas 2.2.1
pansql 0.0.1
parse 1.20.1
pathspec 0.12.1
pip 23.3.2
platformdirs 4.2.0
pluggy 1.4.0
ply 3.11
prefixcommons 0.1.12
prefixmaps 0.2.2
pronto 2.5.6
pydantic 2.6.3
pydantic_core 2.16.3
Pygments 2.17.2
PyJSG 0.11.10
pymdown-extensions 10.7
pyparsing 3.1.1
PyShEx 0.8.1
PyShExC 0.9.1
pysolr 3.9.0
pystow 0.5.3
pytest 8.0.2
pytest-logging 2015.11.4
python-dateutil 2.8.2
python-dotenv 1.0.1
python-ulid 2.2.0
PyTrie 0.4.0
pytz 2024.1
PyYAML 6.0.1
pyyaml_env_tag 0.1
ratelimit 2.2.1
rdflib 7.0.0
rdflib-jsonld 0.6.1
rdflib-shim 1.0.3
referencing 0.33.0
regex 2023.12.25
requests 2.31.0
requests-cache 1.2.0
requests-toolbelt 1.0.0
rfc3339-validator 0.1.4
rfc3987 1.3.8
rpds-py 0.18.0
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.8
scipy 1.12.0
semsimian 0.2.12
semsql 0.3.3
setuptools 69.0.3
ShExJSG 0.8.2
six 1.16.0
sniffio 1.3.1
sortedcontainers 2.4.0
soupsieve 2.5
sparqlslurper 0.5.1
SPARQLWrapper 2.0.0
SQLAlchemy 2.0.27
SQLAlchemy-Utils 0.38.3
sqlite-fts4 1.0.3
sqlite-migrate 0.1b0
sqlite-utils 3.36
sssom 0.4.4
sssom-schema 0.15.0
tabulate 0.9.0
tqdm 4.66.2
types-python-dateutil 2.8.19.20240106
typing_extensions 4.10.0
tzdata 2024.1
uri-template 1.3.0
url-normalize 1.4.3
urllib3 2.2.1
validators 0.22.0
watchdog 4.0.0
webcolors 1.13
wheel 0.42.0
wrapt 1.16.0
xmltodict 0.13.0
zipp 3.17.0
The joy of pip install -U. :/ Thanks for making me think in this direction (other dependencies). It was, indeed, an older 0.6.X curies
version that caused the issue. Sorry about the noise.
Reopening as it was indeed an issue. This does not work:
from oaklib import get_adapter
example = """
format-version: 1.2
data-version: hp/releases/2024-02-25
default-namespace: human_phenotype
idspace: dc http://purl.org/dc/elements/1.1/
idspace: oboInOwl http://www.geneontology.org/formats/oboInOwl#
idspace: owl http://www.w3.org/2002/07/owl#
idspace: rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
idspace: rdfs http://www.w3.org/2000/01/rdf-schema#
idspace: terms http://purl.org/dc/terms/
idspace: xml http://www.w3.org/XML/1998/namespace
idspace: xsd http://www.w3.org/2001/XMLSchema#
ontology: hp.obo
[Term]
id: HP:0000001
name: All
"""
file_path = "example.obo"
# Open the file in write mode ('w'). This will create the file if it does not exist
# or overwrite it if it does.
with open(file_path, 'w') as file:
# Write the string to the file
file.write(example)
adapter = get_adapter("pronto:example.obo")
m = adapter.entity_metadata_map("HP:0000001")
print(m)
If you remove
idspace: oboInOwl http://www.geneontology.org/formats/oboInOwl#
it does. This suggests that we need to somehow handle this for the day when @balhoff PR is merged.
As far as I understand, the problem is as follows:
-
The BasicOntologyInterface’s
prefix_map()
default implementation creates a default prefix map made of the “OBO context”. Presumably the OBO context map contains an entryoio -> http://www.geneontology.org/formats/oboInOwl#
. -
The ProntoImplementation’s
__post_init__()
method adds to that default prefix map the prefixes declared in the OBO file’sidspace
tags:
for prefix, expansion in ontology.metadata.idspaces.items():
self.prefix_map()[prefix] = expansion[0]
(The SimpleOboImplementation does the same thing.)
-
Now the prefix map contains both
oio -> http://www.geneontology.org/formats/oboInOwl#
(from the OBO context) andoboInOwl -> http://www.geneontology.org/formats/oboInOwl#
(from the ontology’s own map). -
The curies converter does not like that at all and error out.
I am not sure I understand why having two prefix names pointing to the same prefix must be an error. I understand that the other way round (the same prefix name pointing to two different prefixes) would obviously be wrong (but that cannot happen here, since existing prefix names in the OBO context would be automatically replaced by the declared prefix name), but not in that direction.
Anyway, if we indeed consider that it is wrong to have two prefix names pointing to the same URL prefix, both the Pronto and the SimpleOBO implementation must be amended because the 2-lines code highlighted above is too naive: instead of simply adding the content of the idspace
declaration to the existing prefix map, it must before check whether the prefix map already contains another prefix name pointing to the same URL prefix, and remove it.
By the way, anyone could run into this problem anytime, independently of @balhoff ’s PR. His PR merely makes it more likely to come across OBO files containing idspace
tags, but anyone can already craft OBO files with such tags if they want.
Solution to this: In basic_ontology_interface.py, this line needs to be
self._converter = curies.Converter.from_prefix_map(self.prefix_map(), strict=False)
This asks the curies
package to be less strict and allow duplicate prefixes. As you can see it's an easy fix.
The questions are:
- what do we want the default to be? ( it is
strict=True
as of now obviously following the lead from thecuries
package) - Do we allow this flag to be a param controlled by the user from anywhere? (This will need a careful refactor I think)
cc: @cmungall
Another possible fix would be to fix
https://github.com/INCATools/ontology-access-kit/blob/15bf85cefc2fe8541b38aabfcf7c65eb46bc1231/src/oaklib/interfaces/basic_ontology_interface.py#L58
So the way the prefixmap is contracted. If the way we use it in sssom-py was used (with ChainMap) it would allow the creating of a prefixmap with precedence rules that would result in a consistent final product. I assume that having conflicting prefixmaps (multiple prefixes for the same URI) could be confusing for the day to day busines.s.