ontology-access-kit icon indicating copy to clipboard operation
ontology-access-kit copied to clipboard

adapter.entity_metadata_map("HP:0000001") with obo adapter causes `DuplicateURIPrefixes` error

Open matentzn opened this issue 11 months ago • 7 comments

Version: oaklib 0.5.25

Replicates with both pronto and simpleobo adapters

Minimal test

from oaklib import get_adapter

example = """
format-version: 1.2
data-version: hp/releases/2024-02-25
ontology: hp.obo

[Term]
id: HP:0000001
name: All
"""

file_path = "example.obo"

# Open the file in write mode ('w'). This will create the file if it does not exist
# or overwrite it if it does.
with open(file_path, 'w') as file:
    # Write the string to the file
    file.write(example)

adapter = get_adapter("simpleobo:example.obo")
m = adapter.entity_metadata_map("HP:0000001")
Error: DuplicateURIPrefixes
DuplicateURIPrefixes                      Traceback (most recent call last)
Cell In[16], [line 22](vscode-notebook-cell:?execution_count=16&line=22)
     [19](vscode-notebook-cell:?execution_count=16&line=19)     file.write(example)
     [21](vscode-notebook-cell:?execution_count=16&line=21) adapter = get_adapter("simpleobo:example.obo")
---> [22](vscode-notebook-cell:?execution_count=16&line=22) m = adapter.entity_metadata_map("HP:0000001")

File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620), in SimpleOboImplementation.entity_metadata_map(self, curie)
    [618](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:618)     m[DEPRECATED_PREDICATE].append(True)
    [619](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:619)     m[HAS_OBSOLESCENCE_REASON].append(TERMS_MERGED)
--> [620](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:620) self.add_missing_property_values(curie, m)
    [621](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/implementations/simpleobo/simple_obo_implementation.py:621) return dict(m)

File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460), in BasicOntologyInterface.add_missing_property_values(self, curie, metadata_map)
   [1458](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1458) if PREFIX_PREDICATE not in metadata_map:
   [1459](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1459)     metadata_map[PREFIX_PREDICATE] = [prefix]
-> [1460](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1460) uri = self.curie_to_uri(curie, False)
   [1461](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1461) if uri:
   [1462](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:1462)     if URL_PREDICATE not in metadata_map:

File [~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240), in BasicOntologyInterface.curie_to_uri(self, curie, strict)
    [238](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:238)         raise ValueError(f"Invalid CURIE: {curie}")
    [239](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:239)     return None
--> [240](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:240) rv = self.converter.expand(curie)
    [241](~/.pyenv/versions/3.11.7/envs/babelon/lib/python3.11/site-packages/oaklib/interfaces/basic_ontology_interface.py:241) if rv is None and strict:
...

http://www.geneontology.org/formats/oboInOwl#:
	prefix='oio' uri_prefix='http://www.geneontology.org/formats/oboInOwl#' prefix_synonyms=[] uri_prefix_synonyms=[] pattern=None
	prefix='oboInOwl' uri_prefix='http://www.geneontology.org/formats/oboInOwl#' prefix_synonyms=[] uri_prefix_synonyms=[] pattern=None

matentzn avatar Feb 26 '24 19:02 matentzn

You might need to share more details about your environment, because I cannot replicate here.

Tried in a clean virtualenv with the latest oaklib 0.5.25, it works just fine.

Also tried with a clean virtualenv setup with babelon 0.2.4 (in case the problem came from a Babelon-specific dependency), same: no errors at all.

Full list of packages with their version:
Package                    Version
-------------------------- ---------------
airium                     0.2.6
annotated-types            0.6.0
antlr4-python3-runtime     4.9.3
anyio                      4.3.0
appdirs                    1.4.4
arrow                      1.3.0
attrs                      23.2.0
Babel                      2.14.0
babelon                    0.2.4
bcp47                      0.0.4
beautifulsoup4             4.12.3
cattrs                     23.2.3
certifi                    2024.2.2
CFGraph                    0.2.1
chardet                    5.2.0
charset-normalizer         3.3.2
class_resolver             0.4.3
click                      8.1.7
click-default-group        1.2.4
colorama                   0.4.6
curies                     0.7.7
Deprecated                 1.2.14
deprecation                2.1.0
distro                     1.9.0
EditorConfig               0.12.4
et-xmlfile                 1.1.0
eutils                     0.6.0
fastobo                    0.12.3
fqdn                       1.5.1
funowl                     0.2.3
ghp-import                 2.1.0
graphviz                   0.20.1
h11                        0.14.0
hbreader                   0.9.1
httpcore                   1.0.4
httpx                      0.27.0
idna                       3.6
ijson                      3.2.3
importlib-metadata         7.0.1
importlib_resources        6.1.2
iniconfig                  2.0.0
isodate                    0.6.1
isoduration                20.11.0
Jinja2                     3.1.3
jsbeautifier               1.15.1
json-flattener             0.1.9
jsonasobj                  1.3.1
jsonasobj2                 1.0.4
jsonpatch                  1.33
jsonpath-ng                1.6.1
jsonpointer                2.4
jsonschema                 4.21.1
jsonschema-specifications  2023.12.1
kgcl-rdflib                0.5.0
kgcl_schema                0.6.4
lark                       1.1.9
linkml                     1.7.5
linkml-dataops             0.1.0
linkml-renderer            0.3.0
linkml-runtime             1.7.2
llm                        0.13.1
lxml                       5.1.0
Markdown                   3.5.2
MarkupSafe                 2.1.5
mergedeep                  1.3.4
mkdocs                     1.5.3
mkdocs-material            9.5.11
mkdocs-material-extensions 1.3.1
mkdocs-mermaid2-plugin     0.6.0
more-click                 0.1.2
ndex2                      3.8.0
networkx                   3.2.1
numpy                      1.26.4
oaklib                     0.5.25
ols-client                 0.1.4
ontoportal-client          0.0.4
openai                     1.12.0
openpyxl                   3.1.2
packaging                  23.2
paginate                   0.5.6
pandas                     2.2.1
pansql                     0.0.1
parse                      1.20.1
pathspec                   0.12.1
pip                        23.3.2
platformdirs               4.2.0
pluggy                     1.4.0
ply                        3.11
prefixcommons              0.1.12
prefixmaps                 0.2.2
pronto                     2.5.6
pydantic                   2.6.3
pydantic_core              2.16.3
Pygments                   2.17.2
PyJSG                      0.11.10
pymdown-extensions         10.7
pyparsing                  3.1.1
PyShEx                     0.8.1
PyShExC                    0.9.1
pysolr                     3.9.0
pystow                     0.5.3
pytest                     8.0.2
pytest-logging             2015.11.4
python-dateutil            2.8.2
python-dotenv              1.0.1
python-ulid                2.2.0
PyTrie                     0.4.0
pytz                       2024.1
PyYAML                     6.0.1
pyyaml_env_tag             0.1
ratelimit                  2.2.1
rdflib                     7.0.0
rdflib-jsonld              0.6.1
rdflib-shim                1.0.3
referencing                0.33.0
regex                      2023.12.25
requests                   2.31.0
requests-cache             1.2.0
requests-toolbelt          1.0.0
rfc3339-validator          0.1.4
rfc3987                    1.3.8
rpds-py                    0.18.0
ruamel.yaml                0.18.6
ruamel.yaml.clib           0.2.8
scipy                      1.12.0
semsimian                  0.2.12
semsql                     0.3.3
setuptools                 69.0.3
ShExJSG                    0.8.2
six                        1.16.0
sniffio                    1.3.1
sortedcontainers           2.4.0
soupsieve                  2.5
sparqlslurper              0.5.1
SPARQLWrapper              2.0.0
SQLAlchemy                 2.0.27
SQLAlchemy-Utils           0.38.3
sqlite-fts4                1.0.3
sqlite-migrate             0.1b0
sqlite-utils               3.36
sssom                      0.4.4
sssom-schema               0.15.0
tabulate                   0.9.0
tqdm                       4.66.2
types-python-dateutil      2.8.19.20240106
typing_extensions          4.10.0
tzdata                     2024.1
uri-template               1.3.0
url-normalize              1.4.3
urllib3                    2.2.1
validators                 0.22.0
watchdog                   4.0.0
webcolors                  1.13
wheel                      0.42.0
wrapt                      1.16.0
xmltodict                  0.13.0
zipp                       3.17.0

gouttegd avatar Feb 28 '24 09:02 gouttegd

The joy of pip install -U. :/ Thanks for making me think in this direction (other dependencies). It was, indeed, an older 0.6.X curies version that caused the issue. Sorry about the noise.

matentzn avatar Feb 28 '24 10:02 matentzn

Reopening as it was indeed an issue. This does not work:

from oaklib import get_adapter

example = """
format-version: 1.2
data-version: hp/releases/2024-02-25
default-namespace: human_phenotype
idspace: dc http://purl.org/dc/elements/1.1/ 
idspace: oboInOwl http://www.geneontology.org/formats/oboInOwl# 
idspace: owl http://www.w3.org/2002/07/owl# 
idspace: rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# 
idspace: rdfs http://www.w3.org/2000/01/rdf-schema# 
idspace: terms http://purl.org/dc/terms/ 
idspace: xml http://www.w3.org/XML/1998/namespace 
idspace: xsd http://www.w3.org/2001/XMLSchema# 
ontology: hp.obo

[Term]
id: HP:0000001
name: All
"""

file_path = "example.obo"

# Open the file in write mode ('w'). This will create the file if it does not exist
# or overwrite it if it does.
with open(file_path, 'w') as file:
    # Write the string to the file
    file.write(example)

adapter = get_adapter("pronto:example.obo")
m = adapter.entity_metadata_map("HP:0000001")
print(m)

If you remove

idspace: oboInOwl http://www.geneontology.org/formats/oboInOwl# 

it does. This suggests that we need to somehow handle this for the day when @balhoff PR is merged.

matentzn avatar Feb 29 '24 16:02 matentzn

As far as I understand, the problem is as follows:

  1. The BasicOntologyInterface’s prefix_map() default implementation creates a default prefix map made of the “OBO context”. Presumably the OBO context map contains an entry oio -> http://www.geneontology.org/formats/oboInOwl#.

  2. The ProntoImplementation’s __post_init__() method adds to that default prefix map the prefixes declared in the OBO file’s idspace tags:

for prefix, expansion in ontology.metadata.idspaces.items():
    self.prefix_map()[prefix] = expansion[0]

(The SimpleOboImplementation does the same thing.)

  1. Now the prefix map contains both oio -> http://www.geneontology.org/formats/oboInOwl# (from the OBO context) and oboInOwl -> http://www.geneontology.org/formats/oboInOwl# (from the ontology’s own map).

  2. The curies converter does not like that at all and error out.

I am not sure I understand why having two prefix names pointing to the same prefix must be an error. I understand that the other way round (the same prefix name pointing to two different prefixes) would obviously be wrong (but that cannot happen here, since existing prefix names in the OBO context would be automatically replaced by the declared prefix name), but not in that direction.

Anyway, if we indeed consider that it is wrong to have two prefix names pointing to the same URL prefix, both the Pronto and the SimpleOBO implementation must be amended because the 2-lines code highlighted above is too naive: instead of simply adding the content of the idspace declaration to the existing prefix map, it must before check whether the prefix map already contains another prefix name pointing to the same URL prefix, and remove it.

gouttegd avatar Feb 29 '24 17:02 gouttegd

By the way, anyone could run into this problem anytime, independently of @balhoff ’s PR. His PR merely makes it more likely to come across OBO files containing idspace tags, but anyone can already craft OBO files with such tags if they want.

gouttegd avatar Feb 29 '24 18:02 gouttegd

Solution to this: In basic_ontology_interface.py, this line needs to be

self._converter = curies.Converter.from_prefix_map(self.prefix_map(), strict=False)

This asks the curies package to be less strict and allow duplicate prefixes. As you can see it's an easy fix.

The questions are:

  • what do we want the default to be? ( it is strict=True as of now obviously following the lead from the curies package)
  • Do we allow this flag to be a param controlled by the user from anywhere? (This will need a careful refactor I think)

cc: @cmungall

hrshdhgd avatar Mar 06 '24 15:03 hrshdhgd

Another possible fix would be to fix

https://github.com/INCATools/ontology-access-kit/blob/15bf85cefc2fe8541b38aabfcf7c65eb46bc1231/src/oaklib/interfaces/basic_ontology_interface.py#L58

So the way the prefixmap is contracted. If the way we use it in sssom-py was used (with ChainMap) it would allow the creating of a prefixmap with precedence rules that would result in a consistent final product. I assume that having conflicting prefixmaps (multiple prefixes for the same URI) could be confusing for the day to day busines.s.

matentzn avatar Mar 06 '24 15:03 matentzn