sparql.anything icon indicating copy to clipboard operation
sparql.anything copied to clipboard

csv option proposal: fx:csv.triple-patterns

Open justin2004 opened this issue 2 years ago • 8 comments

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

In order to allow an easy transition (for users) from Tarql to SPARQL Anything, what if we add an option for csv files that would do the following...

justin@parens$ cat proposal.csv 
name,age,dog
bob,32,fido
jane,,sammy

In order to capture the values we currently need to express a triple pattern for each column like:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" .
       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }
      }
  }

which yields:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

The proposal is to allow this query:

SELECT  *
WHERE
  { SERVICE <x-sparql-anything:>
      { fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
                  fx:csv.null-string  "" ;
                  fx:csv.triple-patterns  "true" .
      }
  } 

to produce this:

row,name,age,dog
_:b0,jane,,sammy
_:b1,bob,32,fido

So that means fx:csv.triple-patterns "true" causes these triple patterns to get inserted implicitly behinds the scenes:

       optional { ?row xyz:name ?name .}
       optional { ?row xyz:age ?age . }
       optional { ?row xyz:dog ?dog . }

justin2004 avatar Feb 14 '23 17:02 justin2004

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

rjyounes avatar Feb 17 '23 21:02 rjyounes

@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?

justin2004 avatar Feb 17 '23 22:02 justin2004

Good question. I haven't ever encountered it. Possibly some hand-correction is required.

rjyounes avatar Feb 17 '23 22:02 rjyounes

Along with this, it would be nice to automatically replace spaces with underscores in the incoming column headers; this is what TARQL does.

Indeed, currently, we are just making those strings URL-safe, which results in some unintuitive %20 appearing. Maybe we can think about adding an option to treat them as web page slugs, but even with that, there can be cases where the result is not intuitive anyway (cases, special chars, etc...).

@rjyounes what if, in a single csv, one column is "state_city" and another is "state city" ? how does TARQL handle the collision?

We already have this problem, sometimes CSVs repeat column names multiple times. We just add _1 etc... not great but intuitive enough.

enridaga avatar Feb 20 '23 10:02 enridaga

Tarql binds values to variables without the need to explicitly express a triple pattern to match/capture the value.

OK, now on the main point. I like the idea of providing a default triple pattern. It's interesting how you would get the same behaviour with the following:

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] xyz:name ?name ;
          xyz:age ?age ;
          xyz:dog ?dog . 
      }

without headers, we would need to add a convention for the variable name ?col_1 etc...

{ fx:properties
                  fx:location     "/app/proposal.csv" ;
                  fx:csv.headers  "true" ;
       [] rdf:_1 ?col_1 ;
          rdf:_2 ?col_2 ;
          rdf:_3 ?col_3 . 
      }

enridaga avatar Feb 20 '23 10:02 enridaga

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

justin2004 avatar Feb 20 '23 13:02 justin2004

@enridaga and we'd need to wrap each of the triple patterns in an OPTIONAL to get the Tarql behavior.

Even if we remove the null-string option?

enridaga avatar Feb 20 '23 16:02 enridaga

Even if we remove the null-string option?

oh, if we don't assert the null-string option then that might be the Tarql behavior.

but i do know that my team likes using the null-string option with the SPARQL Anything OPTIONAL triple patterns (as they transition from Tarql to SPARQL Anything).

justin2004 avatar Feb 20 '23 19:02 justin2004