obo-relations
obo-relations copied to clipboard
Literal decorations and SPARQL
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | wc -l
5280
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep "^<" | wc -l
2813
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | wc -l
2467
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\^^"
235
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\@"
298
This tell me the story there are fifty two hundred statements, about half are literals, and of of those literals about ten percent are tagged with a language, and a different ten percent are tagged with a xml data type.
This matters because of SPARQL.
SPARQL queries consider any decoration as an intrinsic part of the value. Where this causes problems is when decorated state is unknown or worse inconsistent. Several queries instead of one will need to be made to be sure results involving a literal are true and complete.
I have not just now scoured RO for literals which would be identical except for decorations but have noted them within ontologies in general. If no validation of decoration consistency within an ontology exists they are or will be inconsistent.
I have mixed feelings which way decoration within ontologies should go.
On one hand I do like explicit types. And within an ontology is about the best chance for consistency (compared with datasets in the wild)
But the most realistic way for a SPARQL endpoint to achieve consistency in the near term is to strip out all the decorations
This is a very good point - do you want to make a ticket on the ROBOT tracker to include this in reporting/repair? I suspect RO is not the only one to have these issues.
On 5 Aug 2018, at 13:10, Tom Conlin wrote:
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | wc -l 5280 tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep "^<" | wc -l 2813 tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | wc -l 2467 tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\^^" 235 tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\@" 298This tell me the story there are fifty two hundred statements, about half are literals, and of of those literals about ten percent are tagged with a language, and a different ten percent are tagged with a xml data type.
This matters because of SPARQL.
SPARQL queries consider any decoration as an intrinsic part of the value. Where this caused problems is when decorated state is unknown or worse inconsistent. Several queries instead of one will need to be made if the to be sure results involving a literal are true and complete.
I have not just now scoured RO for literals which would be identical except for decorations but have noted them within ontologies in general. If no validation of decoration consistency within an ontology exists they are or will be inconsistent.
I have mixed feelings which way decoration within ontologies should go. On one hand I do like explicit types. And within an ontology is about the best chance for consistency (compared with datasets in the wild) But the most realistic way for a SPARQL endpoint to achieve consistency in the near term is to strip out all the decorations
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/oborel/obo-relations/issues/247
What is the status of this?
Is there still an action item here or can this be closed?
Closing as won't do; please open a new ticket with specific action items as needed.
We are attacking this from a different angle with oak validation and OMO. Good you closed it.