sparql.anything Concatenating variables from differents levels of iteration in a JSON

Hi, I'm having some issues generating a subject that concatenates variables from different levels of this JSON file. I want to concatenate in a single URI (?rel_subject) the fields $.owner.excerpt, $.name.excerpt and each of the values in the array of releases with the tag name $.releases.excerpt.*.tagName, resulting in:

https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v0.1.0
https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v1.0.0
https://www.w3id.org/okn/i/SoftwareVersion/oeg-upm/morph-csv/v1.1.0

This is the complete query. I made sure the SERVICE clause for the releases field extracts the values, but then the BIND doesn't perform the concat I want.

PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
PREFIX  ex:   <http://example.org/>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  sd:   <https://w3id.org/okn/o/sd#>
PREFIX  em:   <https://www.w3id.org/okn/o/em#>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT {

    ?rel_subject a sd:SoftwareVersion ;
        sd:tag ?rel_tagName ;
        sd:name ?rel_name ;
        sd:author ?rel_authorName_uri ;
        sd:description ?rel_body ;
        sd:dateCreated ?rel_dateCreated ;
        sd:datePublished ?rel_datePublised ;
        sd:hasDownloadURL ?rel_zipballUrl ;
        sd:identifier ?rel_url .

    ?rel_authorName_uri rdfs:label ?rel_authorName .

    ?subject a sd:Software ;
        sd:hasVersion ?rel_subject .

}
WHERE
  { SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.owner>
      {
        []  xyz:excerpt ?owner ;
            xyz:confidence [ fx:anySlot ?owner_confidence ] ;
            xyz:technique ?owner_technique .
      }

  BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",?owner)) as ?owner_uri)

  SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.name>
      {
        []  xyz:excerpt ?name .
      }

  BIND(uri(concat("https://www.w3id.org/okn/i/Software/",encode_for_uri(?owner),"/",encode_for_uri(?name))) as ?subject)

  BIND(uri(concat("https://www.w3id.org/okn/i/SourceCode/",encode_for_uri(?owner),"/",encode_for_uri(?name))) as ?source_subject)

  OPTIONAL
    {
    SERVICE <x-sparql-anything:https://raw.githubusercontent.com/oeg-upm/rdf-star-generation/main/use-cases/somef/separated_files/sample_input_one_repo.json,json.path=$.releases>
        {
          [] xyz:excerpt [
             fx:anySlot [
             xyz:tagName ?rel_tagName ;
             xyz:name ?rel_name ;
             xyz:authorName ?rel_authorName ;
             xyz:body ?rel_body ;
             xyz:dateCreated ?rel_dateCreated ;
             xyz:datePublished ?rel_datePublised ;
             xyz:zipballUrl ?rel_zipballUrl ;
             xyz:url ?rel_url ] ].
        }

    BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",encode_for_uri(?rel_authorName))) as ?rel_authorName_uri)
    BIND(uri(concat("https://www.w3id.org/okn/i/SoftwareVersion/",encode_for_uri(?owner),"/",encode_for_uri(?name),"/",encode_for_uri(?rel_tagName))) as ?rel_subject)
    }
  }

I've tried changing the json.paths $.releases and $.releases.excerpt.* (with the involved changes in the facade), but none of them make the BIND work.

Jul 27 '22 17:07 anaigmo

Hi @anaigmo ,

You are in the land of facades now so you don't need json paths like you do in RML. :) You can use SPARQL to destructure the json (well the RDF facade of it).

In your query ?owner gets bound to a bnode so you can't do this:

BIND(uri(concat("https://www.w3id.org/okn/i/Agent/",?owner)) as ?owner_uri)

so below I generated a UUID for each owner instead.

This query:

curl --silent 'http://localhost:3000/sparql.anything'  \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX  sd:   <https://w3id.org/okn/o/sd#>
PREFIX  fx:   <http://sparql.xyz/facade-x/ns/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  xyz:  <http://sparql.xyz/facade-x/data/>
PREFIX  i:    <https://www.w3id.org/okn/i/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT 
  { 
    ?rel_subject rdf:type sd:SoftwareVersion .
    ?rel_subject sd:tag ?rel_tagName .
    ?rel_subject sd:name ?rel_name .
    ?rel_subject sd:author ?rel_authorName_uri .
    ?rel_subject sd:description ?rel_body .
    ?rel_subject sd:dateCreated ?rel_dateCreated .
    ?rel_subject sd:datePublished ?rel_datePublised .
    ?rel_subject sd:hasDownloadURL ?rel_zipballUrl .
    ?rel_subject sd:identifier ?rel_url .
    ?rel_authorName_uri rdfs:label ?rel_authorName .
    ?subject rdf:type sd:Software .
    ?subject sd:hasVersion ?rel_subject .
  }
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location  "/app/sample.json" .
        ?s        xyz:owner    ?owner .
        BIND(struuid() AS ?owner_uuid)
        ?owner    xyz:excerpt  ?o_excerpt .
        ?s xyz:name/xyz:excerpt ?name_excerpt .
        ?s xyz:releases/xyz:excerpt/fx:anySlot ?release .
        ?release  xyz:tagName        ?rel_tagName ;
                  xyz:name           ?rel_name ;
                  xyz:authorName     ?rel_authorName ;
                  xyz:body           ?rel_body ;
                  xyz:dateCreated    ?rel_dateCreated ;
                  xyz:datePublished  ?rel_datePublised ;
                  xyz:zipballUrl     ?rel_zipballUrl ;
                  xyz:url            ?rel_url .
        ?s xyz:citation/fx:anySlot ?cits .
        ?cits xyz:confidence/fx:anySlot* ?confidence .
        ?cits  xyz:technique  ?technique
        FILTER isLiteral(?confidence)
        BIND(uri(concat(str(i:), "Agent/", ?owner_uuid)) AS ?owner_uri)
        BIND(uri(concat(str(i:), "Software/", ?owner_uuid, "/", encode_for_uri(?name_excerpt))) AS ?subject)
        BIND(uri(concat(str(i:), "SourceCode/", ?owner_uuid, "/", encode_for_uri(?name_excerpt))) AS ?source_subject)
        BIND(uri(concat(str(i:), "Agent/", encode_for_uri(?rel_authorName))) AS ?rel_authorName_uri)
        BIND(uri(concat(str(i:), "SoftwareVersion/", ?owner_uuid, "/", encode_for_uri(?name_excerpt), "/", encode_for_uri(?rel_tagName))) AS ?rel_subject)
      }
  }
'

produces these triples:

@prefix fx:   <http://sparql.xyz/facade-x/ns/> .
@prefix i:    <https://www.w3id.org/okn/i/> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sd:   <https://w3id.org/okn/o/sd#> .
@prefix xyz:  <http://sparql.xyz/facade-x/data/> .

<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.0.0>
        rdf:type           sd:SoftwareVersion ;
        sd:author          <https://www.w3id.org/okn/i/Agent/dachafra> ;
        sd:dateCreated     "2020-03-27T10:38:05Z" ;
        sd:datePublished   "2020-03-28T11:24:50Z" ;
        sd:description     "First version of Morph-CSV, used for evaluating the engine for the submission of Special Issue on Storing, Querying and Benchmarking the Web of Data at the Semantic Web Journal 2020\r\nAssociated DOI: https://doi.org/10.5281/zenodo.3731941" ;
        sd:hasDownloadURL  "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v1.0.0" ;
        sd:identifier      "https://api.github.com/repos/oeg-upm/morph-csv/releases/24957950" ;
        sd:name            "v1.0.0" ;
        sd:tag             "v1.0.0" .

<https://www.w3id.org/okn/i/Software/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv>
        rdf:type       sd:Software ;
        sd:hasVersion  <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v0.1.0> , <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.0.0> , <https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.1.0> .

<https://www.w3id.org/okn/i/Agent/dachafra>
        rdfs:label  "dachafra" .

<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v0.1.0>
        rdf:type           sd:SoftwareVersion ;
        sd:author          <https://www.w3id.org/okn/i/Agent/dachafra> ;
        sd:dateCreated     "2020-03-12T18:11:22Z" ;
        sd:datePublished   "2020-03-14T12:30:47Z" ;
        sd:description     "First version of the engine, in java and without SPARQL query guide system" ;
        sd:hasDownloadURL  "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v0.1.0" ;
        sd:identifier      "https://api.github.com/repos/oeg-upm/morph-csv/releases/24525987" ;
        sd:name            "v0.1.0" ;
        sd:tag             "v0.1.0" .

<https://www.w3id.org/okn/i/SoftwareVersion/b1ec2ca4-9dc9-47e1-818e-a69a4d90c480/morph-csv/v1.1.0>
        rdf:type           sd:SoftwareVersion ;
        sd:author          <https://www.w3id.org/okn/i/Agent/dachafra> ;
        sd:dateCreated     "2020-11-06T10:06:53Z" ;
        sd:datePublished   "2020-11-06T10:08:08Z" ;
        sd:description     "Version for the major revision of the paper at the semantic web journal.\r\nNow it's possible to run ?s ?p ?o query over morph-csv to obtain the full RDB instance." ;
        sd:hasDownloadURL  "https://api.github.com/repos/oeg-upm/morph-csv/zipball/v1.1.0" ;
        sd:identifier      "https://api.github.com/repos/oeg-upm/morph-csv/releases/33546985" ;
        sd:name            "v1.1.0" ;
        sd:tag             "v1.1.0" .

Does that look like what you expect?

Jul 28 '22 00:07 justin2004

Hi @justin2004,

I guess I'm still attached to JSONPath :) Your solution works as I want, I just made a little change and didn't use the uuid (instead, made the concat with ?o_excerpt) and get the desired output. Thanks for the help!

Jul 28 '22 03:07 anaigmo

Hi @justin2004, Thank you very much for the help! We were struggling with this issue for a long time bc we didn't find any example in the documentation, so I would suggest that more complex examples (at least for JSON), should be added @enridaga

Jul 28 '22 09:07 dachafra

great! hey, after you all have had some time to destructuring json (and yaml, html, etc.) with SPARQL i'd love to chat about your experience (what is hard, what is easy, etc.).

also you can join the general discussion here.

Jul 28 '22 14:07 justin2004

My two cents: very useful for simple cases (more than RML) but really difficult for complex data integration systems (and very complex to clean, modify and debug a query) in comparison with YARRRML for example. And it's a bit weird that the same things can be done in different ways (facade VS JSONPath, etc). We will publish our thoughts soon ;-)

Jul 28 '22 14:07 dachafra

i think using multiple service clauses each with a jsonpath was not exactly the intended use of jsonpath. i think of jsonpath as a way to move to a subtree so you can focus your query there. plus i think if you use a jsonpath to isolate a subtree the rest of the tree might not get tripified (saving time)... so it is an optimization (is that correct @enridaga ? ).

and for complex data integration my team has found doing multiple stages of construct queries works well. doing a single monster construct query isn't what i'd recommend.

about modifying and debugging a construct query... the skills required to do that are the same skills required to select data out of a graph. we found that using construct queries to uplift data improved our sparql querying skills. if someone has a thoughtfully modeled RDF graph they'll probably want some graph querying skills.

Jul 29 '22 11:07 justin2004

and for complex data integration my team has found doing multiple stages of construct queries works well. doing a single monster construct query isn't what i'd recommend.

But I would like to have a full overview of the graph that I'm going to generate, this solution would not work for me, at least. It's something that I have with YARRRML (or Mapeathor) and it's more sustainable and easy to modify IMO.

about modifying and debugging a construct query... the skills required to do that are the same skills required to select data out of a graph. we found that using construct queries to uplift data improved our sparql querying skills. if someone has a thoughtfully modeled RDF graph they'll probably want some graph querying skills.

Well, I think the role of a data/kg engineer is to integrate the data, and then, other users/roles are the ones who actually exploit the data with SPARQL. IMO, the kg engineer has a medium knowledge of SPARQL by default, but he/she does not necessarily have to know very specific operators as it's demanded by this approach.

Jul 30 '22 07:07 dachafra

On the support for JsonPath, it is there with the sole purpose of pruning the input file before the facade-x interpretation. Crucially, it changes the way the JSON is interpreted, therefore I only recommend using it for pruning very large JSONs or for giving a rational for the slicing option. Essentially, everything from the matching node onwards is transformed as if all matches were in a single JsonArray. A full example is in the documentation (under preparation for v0.8.0): https://github.com/SPARQL-Anything/sparql.anything/blob/v0.8-DEV/docs/formats/JSON.md

I use to create a single large SPARQL query and had many working examples. Troubleshooting for me has never been a problem as I can easily comment SPARQL code blocks or switch to SELECT to make sure all variables are populated accordingly. But I agree a troubleshooting techniques guide would help users.

Aug 01 '22 10:08 enridaga

I think we can close this one.

Jan 27 '23 17:01 enridaga

sparql.anything sparql.anything copied to clipboard

Concatenating variables from differents levels of iteration in a JSON

sparql.anything
sparql.anything copied to clipboard