sparql.anything
sparql.anything copied to clipboard
Is possible to generate BlankNodes from data references?
The behavior should be similar to the one in RML:
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <http://example/> .
@prefix : <http://example.org/> .
@base <http://example.org/> .
:firstTM a rr:TriplesMap ;
rml:logicalSource [
rml:source "data.csv";
rml:referenceFormulation ql:CSV
];
rml:subjectMap [
rml:reference "c1" ;
rr:termType rr:BlankNode
];
rr:predicateObjectMap [
rr:predicate ex:p ;
rml:objectMap [
rr:template "http://example/{c2}"
]
] .
Input
c1,c2
b0,A
Output:
_:b0 ex:p ex:A
You can just construct bnodes:
PREFIX ex: <http://example/>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
CONSTRUCT {
[] ex:p ?A
} WHERE {
SERVICE <x-sparql-anything:> {
fx:properties fx:location "./data.csv" ; fx:csv.headers true .
[] xyz:c2 ?A
}
}
or, if you want to control the bnode identifier for some reason:
PREFIX ex: <http://example/>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
CONSTRUCT {
?bnode ex:p ?A
} WHERE {
SERVICE <x-sparql-anything:> {
fx:properties fx:location "./data.csv" ; fx:csv.headers true .
[] xyz:c1 ?b0 ; xyz:c2 ?A
}
BIND ( BNODE ( ?b0 ) as ?bnode )
}
I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?
I've arrived at this point, yes, but you can not take the identifier of the BN from the input source, right?
You can take it from there, as you see in the second query. I am not sure I get the use case here. Do you mean that you want to keep blank node identifier in the generated graph? The generated blank node ids depend on the serialiser. BNode identifiers are supposed to be local and are usually generated during serialisation or during data loading. So, what's the point of forcing them? If you want to mint an identifier, you probably want an IRI instead. Am I getting it right?
you could do this:
curl --silent 'http://localhost:3000/sparql.anything' \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
SELECT *
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:location "/app/input.csv" ;
fx:csv.headers true .
?s ?p ?o
BIND(iri(?s) AS ?s_iri)
}
}
'
yielding:
s | p | o | s_iri |
---|---|---|---|
_:b0 | http://sparql.xyz/facade-x/data/c1 | b0 | _:file:/app/input.csv##row1 |
_:b0 | http://sparql.xyz/facade-x/data/c2 | A | _:file:/app/input.csv##row1 |
_:b1 | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://sparql.xyz/facade-x/ns/root | _:file:/app/input.csv# |
_:b1 | http://www.w3.org/1999/02/22-rdf-syntax-ns#_1 | _:b0 | _:file:/app/input.csv# |
oh, i know what you want now. one minute.
it appears that apache jena does not let you synthesize a bnode identifier manually. this is as close as i can get but neither quad is what you are looking for (one isn't a well formed quad and i'm not sure about the other). though i think an actual IRI is what i would use in practice.
curl --silent 'http://localhost:3000/sparql.anything' \
--header "Accept: application/n-quads" \
--data-urlencode 'query=
PREFIX : <http://example.com/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
CONSTRUCT
{
?new_s_iri :p ?new_c2 .
?new_s_str :p ?new_c2 .
}
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:location "/app/input.csv" ;
fx:csv.headers true .
?s xyz:c1 ?c1 ;
xyz:c2 ?c2
BIND(iri(concat("_:", ?c1)) AS ?new_s_iri)
BIND(concat("_:", ?c1) AS ?new_s_str)
BIND(iri(concat(str(:), ?c2)) AS ?new_c2)
}
}
'
yields:
"_:b0" <http://example.com/p> <http://example.com/A> .
<_:b0> <http://example.com/p> <http://example.com/A> .
@justin2004 yeah, exactly! I was able to obtain the same results, but I don't think that any of the results are valid RDF, right?
For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions. One of the main benefits of having this feature is that identifiers do not have to be maintained in memory during the execution.
I don't think it is possible to control the blank nodes that are generated by the serializer, but this is probably a question for [email protected]
.
However, while playing with this use case I found an interesting issue when one wants to generate multiple triples with the same bnode on different construct template projections. At the moment, a new bnode is generated for every projection, even if we use the BNODE
function. This is reproducible by adding more rows to the example CSV. A new bnode is created for each one of them. I will open a separate issue for that.
At the moment, a new bnode is generated for every projection, even if we use the BNODE function.
I thought I just wasn't understanding how to use bnode() with an argument but since you might have also expected different behavior I opened an issue: https://issues.apache.org/jira/browse/JENA-2340
For letting you know, this is coming from this R2RML test-cases: https://www.w3.org/2001/sw/rdb2rdf/test-cases/#R2RMLTC0002b. It is not that I specifically want to have this feature in the engine but it is more for comparing both solutions.
Considering they are bnodes, the comparison can be done via graph isomorphism (there are some useful utils for this in Jena).