sparql.anything icon indicating copy to clipboard operation
sparql.anything copied to clipboard

Loading JSON input via -i parameter

Open milvld opened this issue 2 years ago • 3 comments

Hi

I'm trying to load an external input file via one of the command-line options, but I keep getting errors. I tried it with the query and Friends data from the documentation below and keep on getting the same errors.

Couple of questions:

  • What am I overlooking regarding the JSON parse error? The JSON seems to validate otherwise.
  • Are you supposed to use the -i or -l parameter to load data from an (external) input file?
  • Do you have to refer in the SERVICE statement in any way that you're using data from an input file provided via CLI?

Use case: I'm trying to build a simple ETL-pipeline in Python in which I extract data from a RDBMS, transform it with Sparql Anything and feed it to a triple store.

Error with -i parameter

Exception in thread "main" org.apache.jena.atlas.json.JsonParseException: Not a JSON object START: [LBRACKET]

Resulting from call

java -jar ./sparql-anything-0.6.0.jar -q friends_test.sparql -i friends_test.json -f json

Error with -l parameter

Exception in thread "main" org.apache.jena.riot.RiotException: Failed to determine the content type: (URI=file:///home/milan/Documents/github_meemoo/kg_etl/friends_test.json : stream=null)

Resulting from call

java -jar ./sparql-anything-0.6.0.jar -q friends_test.sparql -l friends_test.json -f json

SPARQL-query

Saved to a local query file as friends_test.sparql.

PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>

SELECT ?seriesName
WHERE {

    SERVICE <x-sparql-anything> {
        ?tvSeries xyz:name ?seriesName .
        ?tvSeries xyz:stars ?star .
        ?star fx:anySlot "Courteney Cox" .
    }

} 

Data

Saved to a local input file as friends_test.json.

[
  {
    "name":"Friends",
    "genres":[
      "Comedy",
      "Romance"
    ],
    "language":"English",
    "status":"Ended",
    "premiered":"1994-09-22",
    "summary":"Follows the personal and professional lives of six twenty to thirty-something-year-old friends living in Manhattan.",
    "stars":[
      "Jennifer Aniston",
      "Courteney Cox",
      "Lisa Kudrow",
      "Matt LeBlanc",
      "Matthew Perry",
      "David Schwimmer"
    ]
  },
  {
    "name":"Cougar Town",
    "genres":[
      "Comedy",
      "Romance"
    ],
    "language":"English",
    "status":"Ended",
    "premiered":"2009-09-23",
    "summary":"Jules is a recently divorced mother who has to face the unkind realities of dating in a world obsessed with beauty and youth. As she becomes older, she starts discovering herself.",
    "stars":[
      "Courteney Cox",
      "David Arquette",
      "Bill Lawrence",
      "Linda Videtti Figueiredo",
      "Blake McCormick"
    ]
  }
]

milvld avatar May 03 '22 14:05 milvld

Executing FX transformations outside service clauses is not supported at the moment.

One way could be to change the query as follows:

PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>

SELECT ?seriesName
WHERE {

    SERVICE <x-sparql-anything:> {
        [] fx:properties fx:location ?_file ; fx:media-type "application/json" .
        ?tvSeries xyz:name ?seriesName .
        ?tvSeries xyz:stars ?star .
        ?star fx:anySlot "Courteney Cox" .
    }
} 

and pass the value for the ?_file basil template variable via an input parameter:

fx -q query.sparql -v file=friends_test.json

(Not tested)

The option -i is limited to SPARQL result set files to be used as input for parametrised queries (where the bindings in the result set have the same name as basil template variables).

The option -l is limited to RDF data to be load in-memory before running the SPARQL Anything query. This executes the query

enridaga avatar May 03 '22 14:05 enridaga

@enridaga : Cool, it works!

I misread the documentation regarding the -l parameter; I thought it accepted any type of input file (as long as its media/mime type was specified with the -f parameter), not just data already in RDF.

Anyway, I'll experiment a bit further using the -v parameter and the basil template variable.

Do the basil template variables allow multiple inputs like this? Or is it limited to one single input file?

milvld avatar May 04 '22 13:05 milvld

Do the basil template variables allow multiple inputs like this? Or is it limited to one single input file?

sorry @milvala, I did not reply to this one and stumbled on it now! The option currently supports one single file.

enridaga avatar Aug 01 '22 14:08 enridaga

I think we can close this one.

enridaga avatar Sep 08 '22 15:09 enridaga

The CLI parameter --input|-i is being deprecated (see #277) The same functionality is available with option --values|-v that now also accept a single SPARQL result set file.

enridaga avatar Sep 08 '22 15:09 enridaga