sparql.anything
sparql.anything copied to clipboard
csv option proposal: fx:csv.triple-patterns
Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.
In order to allow an easy transition (for users) from Tarql to SPARQL Anything, what if we add an option for csv files that would do the following...
justin@parens$ cat proposal.csv
name,age,dog
bob,32,fido
jane,,sammy
In order to capture the values we currently need to express a triple pattern for each column like:
SELECT *
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:location "/app/proposal.csv" ;
fx:csv.headers "true" ;
fx:csv.null-string "" .
optional { ?row xyz:name ?name .}
optional { ?row xyz:age ?age . }
optional { ?row xyz:dog ?dog . }
}
}
which yields:
row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido
The proposal is to allow this query:
SELECT *
WHERE
{ SERVICE <x-sparql-anything:>
{ fx:properties
fx:location "/app/proposal.csv" ;
fx:csv.headers "true" ;
fx:csv.null-string "" ;
fx:csv.triple-patterns "true" .
}
}
to produce this:
row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido
So that means fx:csv.triple-patterns "true"
causes these triple patterns to get inserted implicitly behinds the scenes:
optional { ?row xyz:name ?name .}
optional { ?row xyz:age ?age . }
optional { ?row xyz:dog ?dog . }
Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.
@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?
Good question. I haven't ever encountered it. Possibly some hand-correction is required.
Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.
Indeed, currently, we are just making those strings URL-safe, which results in some unintuitive %20 appearing. Maybe we can think about adding an option to treat them as web page slugs, but even with that, there can be cases where the result is not intuitive anyway (cases, special chars, etc...).
@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?
We already have this problem, sometimes CSVs repeat column names multiple times. We just add _1
etc... not great but intuitive enough.
Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.
OK, now on the main point. I like the idea of providing a default triple pattern. It's interesting how you would get the same behaviour with the following:
{ fx:properties
fx:location "/app/proposal.csv" ;
fx:csv.headers "true" ;
[] xyz:name ?name ;
xyz:age ?age ;
xyz:dog ?dog .
}
without headers, we would need to add a convention for the variable name ?col_1
etc...
{ fx:properties
fx:location "/app/proposal.csv" ;
fx:csv.headers "true" ;
[] rdf:_1 ?col_1 ;
rdf:_2 ?col_2 ;
rdf:_3 ?col_3 .
}
@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.
@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.
Even if we remove the null-string option?
Even if we remove the null-string option?
oh, if we don't assert the null-string option then that might be the Tarql behavior.
but i do know that my team likes using the null-string option with the SPARQL Anything OPTIONAL triple patterns (as they transition from Tarql to SPARQL Anything).