bioregistry
bioregistry copied to clipboard
Invalid IRIs in results of the Bioregistry SPARQL endpoint
To reproduce the issue run the following query on the SPARQL endpoint (https://bioregistry.io/sparql).
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?o WHERE {
<http://identifiers.org/ensembl/ENSG00000006125> owl:sameAs ?o
}
The result contains invalid IRIs such as http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125
.
I discovered this issue when trying to issue such queries from a program that is implemented based on the Jena library. In particular, when trying to print the result of this query, Jena throws the following exception.
<http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
org.apache.jena.irix.IRIException: <http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
hi @hartig, thanks for letting us know about this and including an example.
This might be something coming in to the bioregistry from Prefix Commons. We can either fix this in the way the bioregistry loads the URI prefixes into the curies
data structure, or directly upstream in the curies
package. I'm at the Biocuration 2023 conference now but will try and address this by the end of the week.