bioregistry icon indicating copy to clipboard operation
bioregistry copied to clipboard

Invalid IRIs in results of the Bioregistry SPARQL endpoint

Open hartig opened this issue 1 year ago • 1 comments

To reproduce the issue run the following query on the SPARQL endpoint (https://bioregistry.io/sparql).

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?o WHERE {
    <http://identifiers.org/ensembl/ENSG00000006125> owl:sameAs ?o
}

The result contains invalid IRIs such as http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125.

I discovered this issue when trying to issue such queries from a program that is implemented based on the Jena library. In particular, when trying to print the result of this query, Jena throws the following exception.

<http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
org.apache.jena.irix.IRIException: <http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.

hartig avatar Apr 26 '23 07:04 hartig

hi @hartig, thanks for letting us know about this and including an example.

This might be something coming in to the bioregistry from Prefix Commons. We can either fix this in the way the bioregistry loads the URI prefixes into the curies data structure, or directly upstream in the curies package. I'm at the Biocuration 2023 conference now but will try and address this by the end of the week.

cthoyt avatar Apr 26 '23 08:04 cthoyt