obo-relations icon indicating copy to clipboard operation
obo-relations copied to clipboard

Literal decorations and SPARQL

Open TomConlin opened this issue 7 years ago • 3 comments

tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | wc -l
5280
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep "^<" | wc -l
2813
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | wc -l
2467
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\^^" 
235
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c "\@" 
298

This tell me the story there are fifty two hundred statements, about half are literals, and of of those literals about ten percent are tagged with a language, and a different ten percent are tagged with a xml data type.

This matters because of SPARQL.

SPARQL queries consider any decoration as an intrinsic part of the value. Where this causes problems is when decorated state is unknown or worse inconsistent. Several queries instead of one will need to be made to be sure results involving a literal are true and complete.

I have not just now scoured RO for literals which would be identical except for decorations but have noted them within ontologies in general. If no validation of decoration consistency within an ontology exists they are or will be inconsistent.

I have mixed feelings which way decoration within ontologies should go. On one hand I do like explicit types. And within an ontology is about the best chance for consistency (compared with datasets in the wild)
But the most realistic way for a SPARQL endpoint to achieve consistency in the near term is to strip out all the decorations

TomConlin avatar Aug 05 '18 20:08 TomConlin

This is a very good point - do you want to make a ticket on the ROBOT tracker to include this in reporting/repair? I suspect RO is not the only one to have these issues.

On 5 Aug 2018, at 13:10, Tom Conlin wrote:

tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | wc -l
5280
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep "^<" | wc -l
2813
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | wc -l
2467
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c 
"\^^"
235
tomc@cypher ~/junq $ cut -f3- -d' ' ro.nt | grep -v "^<" | grep -c 
"\@"
298

This tell me the story there are fifty two hundred statements, about half are literals, and of of those literals about ten percent are tagged with a language, and a different ten percent are tagged with a xml data type.

This matters because of SPARQL.

SPARQL queries consider any decoration as an intrinsic part of the value. Where this caused problems is when decorated state is unknown or worse inconsistent. Several queries instead of one will need to be made if the to be sure results involving a literal are true and complete.

I have not just now scoured RO for literals which would be identical except for decorations but have noted them within ontologies in general. If no validation of decoration consistency within an ontology exists they are or will be inconsistent.

I have mixed feelings which way decoration within ontologies should go. On one hand I do like explicit types. And within an ontology is about the best chance for consistency (compared with datasets in the wild) But the most realistic way for a SPARQL endpoint to achieve consistency in the near term is to strip out all the decorations

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/oborel/obo-relations/issues/247

cmungall avatar Aug 05 '18 22:08 cmungall

What is the status of this?

nlharris avatar Oct 16 '20 06:10 nlharris

Is there still an action item here or can this be closed?

nlharris avatar Jan 06 '22 21:01 nlharris

Closing as won't do; please open a new ticket with specific action items as needed.

nlharris avatar Oct 30 '22 23:10 nlharris

We are attacking this from a different angle with oak validation and OMO. Good you closed it.

matentzn avatar Oct 31 '22 06:10 matentzn