Inconsistent string typing in SPARQL JSON results
Hello,
When results are returned as application/sparql-results+json, the datatype of strings is inconsistent depending on character content.
If the string contains only ASCII characters, no datatype is added.
If it includes non-ASCII characters, the datatype http://www.w3.org/2001/XMLSchema#string is attached.
This inconsistency causes issues further down the line (e.g DELETE) as Virtuoso does not treat both forms as equivalent.
Virtuoso version
Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.10.3237-pthreads as of Jul 12 2023 (000000)
Compiled for Linux (x86_64-pc-linux-gnu)
Copyright (C) 1998-2023 OpenLink Software
Below are steps to reproduce the issue.
Let us know if you're able to reproduce it on your side.
Thanks!
Step: insert the data
INSERT DATA {
GRAPH <http://debug> {
<http://test/1> <http://bar> "België spelled correctly".
<http://test/2> <http://bar> "Belgie spelled incorrectly".
}
}
Step: check the data with ASCII-only
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH <http://debug> {
?s ?p ?o
}
VALUES ?s {
<http://test/2>
}
VALUES ?p {
<http://bar>
}
VALUES ?o {
"Belgie spelled incorrectly"
}
}
Return as XML 🆗
<sparql
xmlns="http://www.w3.org/2005/sparql-results#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="s"/>
<variable name="p"/>
<variable name="o"/>
</head>
<results distinct="false" ordered="true">
<result>
<binding name="s">
<uri>http://test/2</uri>
</binding>
<binding name="p">
<uri>http://bar</uri>
</binding>
<binding name="o">
<literal>Belgie spelled incorrectly</literal>
</binding>
</result>
</results>
</sparql>
Return as application/sparql-results+json 🆗
{
"head": {
"link": [],
"vars": [
"s",
"p",
"o"
]
},
"results": {
"distinct": false,
"ordered": true,
"bindings": [
{
"s": {
"type": "uri",
"value": "http://test/2"
},
"p": {
"type": "uri",
"value": "http://bar"
},
"o": {
"type": "literal",
"value": "Belgie spelled incorrectly"
}
}
]
}
}
Step: check the data with non-ASCII
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH <http://debug> {
?s ?p ?o
}
VALUES ?s {
<http://test/1>
}
VALUES ?p {
<http://bar>
}
VALUES ?o {
"België spelled correctly"
}
}
Return as XML 🆗
<sparql
xmlns="http://www.w3.org/2005/sparql-results#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="s"/>
<variable name="p"/>
<variable name="o"/>
</head>
<results distinct="false" ordered="true">
<result>
<binding name="s">
<uri>http://test/1</uri>
</binding>
<binding name="p">
<uri>http://bar</uri>
</binding>
<binding name="o">
<literal>België spelled correctly</literal>
</binding>
</result>
</results>
</sparql>
Return as application/sparql-results+json ❌
{
"head": {
"link": [],
"vars": [
"s",
"p",
"o"
]
},
"results": {
"distinct": false,
"ordered": true,
"bindings": [
{
"s": {
"type": "uri",
"value": "http://test/1"
},
"p": {
"type": "uri",
"value": "http://bar"
},
"o": {
"type": "typed-literal",
"datatype": "http://www.w3.org/2001/XMLSchema#string",
"value": "Belgi\\u00EB spelled correctly"
}
}
]
}
}
We are looking into this ...
The value of "value": "Belgi\u00EB spelled correctly" is a valid JSON escaped
unicode string and will be parsed properly by any JSON aware parser.
Which can be seen if the Virtuoso JSON output:
{ "head": { "link": [], "vars": ["s", "p", "o"] },
"results": { "distinct": false, "ordered": true, "bindings": [
{ "s": { "type": "uri", "value": "http://test/1" } , "p": { "type": "uri", "value": "http://bar/" } , "o": { "type": "typed-literal", "datatype": "http://www.w3.org/2001/XMLSchema#string", "value": "Belgi\u00EB spelled correctly" }} ] } }
is copied into a JSON aware parser like https://jsonformatter.org/json-parser which returns:
{
"head": {
"link": [],
"vars": [
"s",
"p",
"o"
]
},
"results": {
"distinct": false,
"ordered": true,
"bindings": [
{
"s": {
"type": "uri",
"value": "http://test/1"
},
"p": {
"type": "uri",
"value": "http://bar/"
},
"o": {
"type": "typed-literal",
"datatype": "http://www.w3.org/2001/XMLSchema#string",
"value": "België spelled correctly"
}
}
]
}
}
@HughWilliams — If I understand correctly, the concern raised by @cecemel is that while the input datatypes are the same (or at least, are both unset) —
INSERT DATA {
GRAPH <http://debug> {
<http://test/1> <http://bar> "België spelled correctly".
<http://test/2> <http://bar> "Belgie spelled incorrectly".
}
}
— the output datatypes aren't the same, i.e. —
"o": {
"type": "literal",
"value": "Belgie spelled incorrectly"
}
vs
"o": {
"type": "typed-literal",
"datatype": "http://www.w3.org/2001/XMLSchema#string",
"value": "België spelled correctly"
}
On that basis, I'm reopening the issue, pending indication by @cecemel that their concern is resolved.
@HughWilliams — If I understand correctly, the concern raised by @cecemel is that while the input datatypes are the same (or at least, are both unset) —
INSERT DATA { GRAPH <http://debug> { <http://test/1> <http://bar> "België spelled correctly". <http://test/2> <http://bar> "Belgie spelled incorrectly". } }— the output datatypes aren't the same, i.e. —
"o": { "type": "literal", "value": "Belgie spelled incorrectly" }vs
"o": { "type": "typed-literal", "datatype": "http://www.w3.org/2001/XMLSchema#string", "value": "België spelled correctly" }On that basis, I'm reopening the issue, pending indication by @cecemel that their concern is resolved.
That's indeed the issue, thanks for clarifying!
Yes, we are preparing a fix for that issue ...
Sorry to ask, any updates on this?