robot
robot copied to clipboard
ROBOT query --update strips prefixes when output is not .owl (RDF/XML)
The robot query command performing an update operation drops prefixes when the output is .ofn or .omn but not .owl (all I tested). This seems to be the same issue as #1101 except it's happening for robot update queries and wasn't fixed by PR https://github.com/ontodev/robot/pull/1106 (still happens in 1.9.5). The doid-edit.owl input file is formatted as .ofn. This happens for all SPARQL update queries I've tried (including a completely empty one, see bottom).
Prefixes dropped
.ofn output loses prefixes:
robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.ofn \
&& mv tmp.ofn doid-edit.owl
Chaining convert doesn't help:
robot \
query -i doid-edit.owl --update fix_whitespace.rq \
convert -o tmp.ofn \
&& mv tmp.ofn doid-edit.owl
Separate convert doesn't help (for .ofn or .omn):
robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.omn \
&& robot convert -i tmp.omn -o doid-edit.owl --format ofn \
&& rm tmp.omn
Result:
Prefixes Preserved
.owl output preserves prefixes:
robot query -i doid-edit.owl --update fix_whitespace.rq -o tmp.owl \
&& robot convert -i tmp.owl -o doid-edit.owl --format ofn \
&& rm tmp.owl
Using --add-prefixes also works (my current workaround):
robot --add-prefixes prefixes.json \
query -i doid-edit.owl --update fix_whitespace.rq -o tmp.ofn \
&& mv tmp.ofn doid-edit.owl
SPARQL queries
fix_whitespace.rq:
# remove extra whitespace from ALL strings (e.g. in defs, xrefs, labels, etc.)
# -> removes 2+ spaces, spaces before commas or periods, and spaces at beginning or end of string
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
DELETE { ?s ?p ?o . }
INSERT { ?s ?p ?new_o . }
WHERE {
?s ?p ?o .
FILTER( datatype(?o) = xsd:string )
BIND(
REPLACE(
REPLACE(?o, " (,) *| +", "$1 "),
" (\\.)| +$|^ +", "$1"
) AS ?new_o
)
}
empty sparql update query:
DELETE { }
INSERT { }
WHERE {
?s a owl:Class .
}
prefixes.json file
{
"@context": {
"obo": "http://purl.obolibrary.org/obo/",
"oboInOwl": "http://www.geneontology.org/formats/oboInOwl#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xml": "http://www.w3.org/XML/1998/namespace",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"owl": "http://www.w3.org/2002/07/owl#",
"terms": "http://purl.org/dc/terms/",
"dc": "http://purl.org/dc/elements/1.1/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"doid": "http://purl.obolibrary.org/obo/doid#"
}
}
Thanks for pointing to #1106, which uses isPrefixOWLOntologyFormat() to check whether a format should use prefixes. That should be correct. In this case robot query is converting the input ontology to Turtle, loading into Jena, running SPARQL, converting back to Turtle, and reading in to OWLAPI again. I guess that the format of the input ontology is being lost. If I'm right, then the prefixes won't be preserved for RDFXML format either, but we might be setting decent prefixes in that case.
Do you (or anyone reading this) have time to dig into this issue? I have some big deadlines coming up.
I'd love to help more but I don't have sufficient expertise with Java (or sufficient familiarity with the internal workings of ROBOT/OWLAPI) to delve into this. My apologies.
Seems @souzadevinicius is interested to look at this, but its actually a quite complex issue possibly - we will see.