sparql.anything
sparql.anything copied to clipboard
Concatenating variables from differents levels of iteration in a JSON
Hi, I'm having some issues generating a subject that concatenates variables from different levels of this JSON file. I want to concatenate in a single URI (?rel_subject
) the fields $.owner.excerpt
, $.name.excerpt
and each of the values in the array of releases with the tag name $.releases.excerpt.*.tagName
, resulting in:
- https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v0.1.0
- https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v1.0.0
- https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v1.1.0
This is the complete query. I made sure the SERVICE clause for the releases field extracts the values, but then the BIND doesn't perform the concat I want.
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX ex: <http://example.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sd: <https://w3id.org/okn/o/sd#>
PREFIX em: <https://www.w3id.org/okn/o/em#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {
?rel_subject a sd:SoftwareVersion ;
sd:tag ?rel_tagName ;
sd:name ?rel_name ;
sd:author ?rel_authorName_uri ;
sd:description ?rel_body ;
sd:dateCreated ?rel_dateCreated ;
sd:datePublished ?rel_datePublised ;
sd:hasDownloadURL ?rel_zipballUrl ;
sd:identifier ?rel_url .
?rel_authorName_uri rdfs:label ?rel_authorName .
?subject a sd:Software ;
sd:hasVersion ?rel_subject .
}
WHERE
{ SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.owner>
{
[] xyz:excerpt ?owner ;
xyz:confidence [ fx:anySlot ?owner_confidence ] ;
xyz:technique ?owner_technique .
}
BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",?owner)) as ?owner_uri)
SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.name>
{
[] xyz:excerpt ?name .
}
BIND(uri(concat("https://www.w3id.org/okn/i/Software/",encode_for_uri(?owner),"/",encode_for_uri(?name))) as ?subject)
BIND(uri(concat("https://www.w3id.org/okn/i/SourceCode/",encode_for_uri(?owner),"/",encode_for_uri(?name))) as ?source_subject)
OPTIONAL
{
SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.releases>
{
[] xyz:excerpt [
fx:anySlot [
xyz:tagName ?rel_tagName ;
xyz:name ?rel_name ;
xyz:authorName ?rel_authorName ;
xyz:body ?rel_body ;
xyz:dateCreated ?rel_dateCreated ;
xyz:datePublished ?rel_datePublised ;
xyz:zipballUrl ?rel_zipballUrl ;
xyz:url ?rel_url ] ].
}
BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",encode_for_uri(?rel_authorName))) as ?rel_authorName_uri)
BIND(uri(concat("https://www.w3id.org/okn/i/SoftwareVersion/",encode_for_uri(?owner),"/",encode_for_uri(?name),"/",encode_for_uri(?rel_tagName))) as ?rel_subject)
}
}
I've tried changing the json.paths $.releases
and $.releases.excerpt.*
(with the involved changes in the facade), but none of them make the BIND work.
Hi @anaigmo ,
You are in the land of facades now so you don't need json paths like you do in RML. :) You can use SPARQL to destructure the json (well the RDF facade of it).
In your query ?owner
gets bound to a bnode so you can't do this:
BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",?owner)) as ?owner_uri)
so below I generated a UUID for each owner instead.
This query:
curl --silent 'http://localhost:3000/sparql.anything' \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX sd: <https://w3id.org/okn/o/sd#>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX i: <https://www.w3id.org/okn/i/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT
{
?rel_subject rdf:type sd:SoftwareVersion .
?rel_subject sd:tag ?rel_tagName .
?rel_subject sd:name ?rel_name .
?rel_subject sd:author ?rel_authorName_uri .
?rel_subject sd:description ?rel_body .
?rel_subject sd:dateCreated ?rel_dateCreated .
?rel_subject sd:datePublished ?rel_datePublised .
?rel_subject sd:hasDownloadURL ?rel_zipballUrl .
?rel_subject sd:identifier ?rel_url .
?rel_authorName_uri rdfs:label ?rel_authorName .
?subject rdf:type sd:Software .
?subject sd:hasVersion ?rel_subject .
}
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:location "/app/sample.json" .
?s xyz:owner ?owner .
BIND(struuid() AS ?owner_uuid)
?owner xyz:excerpt ?o_excerpt .
?s xyz:name/xyz:excerpt ?name_excerpt .
?s xyz:releases/xyz:excerpt/fx:anySlot ?release .
?release xyz:tagName ?rel_tagName ;
xyz:name ?rel_name ;
xyz:authorName ?rel_authorName ;
xyz:body ?rel_body ;
xyz:dateCreated ?rel_dateCreated ;
xyz:datePublished ?rel_datePublised ;
xyz:zipballUrl ?rel_zipballUrl ;
xyz:url ?rel_url .
?s xyz:citation/fx:anySlot ?cits .
?cits xyz:confidence/fx:anySlot* ?confidence .
?cits xyz:technique ?technique
FILTER isLiteral(?confidence)
BIND(uri(concat(str(i:), "Agent/", ?owner_uuid)) AS ?owner_uri)
BIND(uri(concat(str(i:), "Software/", ?owner_uuid, "/", encode_for_uri(?name_excerpt))) AS ?subject)
BIND(uri(concat(str(i:), "SourceCode/", ?owner_uuid, "/", encode_for_uri(?name_excerpt))) AS ?source_subject)
BIND(uri(concat(str(i:), "Agent/", encode_for_uri(?rel_authorName))) AS ?rel_authorName_uri)
BIND(uri(concat(str(i:), "SoftwareVersion/", ?owner_uuid, "/", encode_for_uri(?name_excerpt), "/", encode_for_uri(?rel_tagName))) AS ?rel_subject)
}
}
'
produces these triples:
@prefix fx: <http://sparql.xyz/facade-x/ns/> .
@prefix i: <https://www.w3id.org/okn/i/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sd: <https://w3id.org/okn/o/sd#> .
@prefix xyz: <http://sparql.xyz/facade-x/data/> .
<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.0.0>
rdf:type sd:SoftwareVersion ;
sd:author <https://www.w3id.org/okn/i/Agent/dachafra> ;
sd:dateCreated "2020-03-27T10:38:05Z" ;
sd:datePublished "2020-03-28T11:24:50Z" ;
sd:description "First version of Morph-CSV, used for evaluating the engine for the submission of Special Issue on Storing, Querying and Benchmarking the Web of Data at the Semantic Web Journal 2020\r\nAssociated DOI: https://doi.org/10.5281/zenodo.3731941" ;
sd:hasDownloadURL "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v1.0.0" ;
sd:identifier "https://api.github.com/repos/oeg-upm/morph-csv/releases/24957950" ;
sd:name "v1.0.0" ;
sd:tag "v1.0.0" .
<https://www.w3id.org/okn/i/Software/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv>
rdf:type sd:Software ;
sd:hasVersion <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v0.1.0> , <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.0.0> , <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.1.0> .
<https://www.w3id.org/okn/i/Agent/dachafra>
rdfs:label "dachafra" .
<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v0.1.0>
rdf:type sd:SoftwareVersion ;
sd:author <https://www.w3id.org/okn/i/Agent/dachafra> ;
sd:dateCreated "2020-03-12T18:11:22Z" ;
sd:datePublished "2020-03-14T12:30:47Z" ;
sd:description "First version of the engine, in java and without SPARQL query guide system" ;
sd:hasDownloadURL "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v0.1.0" ;
sd:identifier "https://api.github.com/repos/oeg-upm/morph-csv/releases/24525987" ;
sd:name "v0.1.0" ;
sd:tag "v0.1.0" .
<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.1.0>
rdf:type sd:SoftwareVersion ;
sd:author <https://www.w3id.org/okn/i/Agent/dachafra> ;
sd:dateCreated "2020-11-06T10:06:53Z" ;
sd:datePublished "2020-11-06T10:08:08Z" ;
sd:description "Version for the major revision of the paper at the semantic web journal.\r\nNow it's possible to run ?s ?p ?o query over morph-csv to obtain the full RDB instance." ;
sd:hasDownloadURL "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v1.1.0" ;
sd:identifier "https://api.github.com/repos/oeg-upm/morph-csv/releases/33546985" ;
sd:name "v1.1.0" ;
sd:tag "v1.1.0" .
Does that look like what you expect?
Hi @justin2004,
I guess I'm still attached to JSONPath :) Your solution works as I want, I just made a little change and didn't use the uuid (instead, made the concat with ?o_excerpt) and get the desired output. Thanks for the help!
Hi @justin2004, Thank you very much for the help! We were struggling with this issue for a long time bc we didn't find any example in the documentation, so I would suggest that more complex examples (at least for JSON), should be added @enridaga
great! hey, after you all have had some time to destructuring json (and yaml, html, etc.) with SPARQL i'd love to chat about your experience (what is hard, what is easy, etc.).
also you can join the general discussion here.
My two cents: very useful for simple cases (more than RML) but really difficult for complex data integration systems (and very complex to clean, modify and debug a query) in comparison with YARRRML for example. And it's a bit weird that the same things can be done in different ways (facade VS JSONPath, etc). We will publish our thoughts soon ;-)
i think using multiple service clauses each with a jsonpath was not exactly the intended use of jsonpath. i think of jsonpath as a way to move to a subtree so you can focus your query there. plus i think if you use a jsonpath to isolate a subtree the rest of the tree might not get tripified (saving time)... so it is an optimization (is that correct @enridaga ? ).
and for complex data integration my team has found doing multiple stages of construct queries works well. doing a single monster construct query isn't what i'd recommend.
about modifying and debugging a construct query... the skills required to do that are the same skills required to select data out of a graph. we found that using construct queries to uplift data improved our sparql querying skills. if someone has a thoughtfully modeled RDF graph they'll probably want some graph querying skills.
and for complex data integration my team has found doing multiple stages of construct queries works well. doing a single monster construct query isn't what i'd recommend.
But I would like to have a full overview of the graph that I'm going to generate, this solution would not work for me, at least. It's something that I have with YARRRML (or Mapeathor) and it's more sustainable and easy to modify IMO.
about modifying and debugging a construct query... the skills required to do that are the same skills required to select data out of a graph. we found that using construct queries to uplift data improved our sparql querying skills. if someone has a thoughtfully modeled RDF graph they'll probably want some graph querying skills.
Well, I think the role of a data/kg engineer is to integrate the data, and then, other users/roles are the ones who actually exploit the data with SPARQL. IMO, the kg engineer has a medium knowledge of SPARQL by default, but he/she does not necessarily have to know very specific operators as it's demanded by this approach.
On the support for JsonPath, it is there with the sole purpose of pruning the input file before the facade-x interpretation. Crucially, it changes the way the JSON is interpreted, therefore I only recommend using it for pruning very large JSONs or for giving a rational for the slicing option. Essentially, everything from the matching node onwards is transformed as if all matches were in a single JsonArray. A full example is in the documentation (under preparation for v0.8.0): https://github.com/SPARQL-Anything/sparql.anything/blob/v0.8-DEV/docs/formats/JSON.md
I use to create a single large SPARQL query and had many working examples. Troubleshooting for me has never been a problem as I can easily comment SPARQL code blocks or switch to SELECT to make sure all variables are populated accordingly. But I agree a troubleshooting techniques guide would help users.
I think we can close this one.