rdf-tabular
rdf-tabular copied to clipboard
how to parse a csv file with a metadata file
I have a csv file and a metadata file on my file system. With csv2rdf I can write
csv2rdf -t data/pplEx.csv -u pplEx.csv-meta.json -m minimal
in order to transform data/pplEx.csv using the pplEx.csv-meta.json metadata file. This will return ntriples for the csv file.
I can't work out how to do the same with rdf-tabular. All the examples use http urls which gives me the impression that one first has to setup special headers in the csv http headers. IS that right, or have I missed the command line needed when debugging.
I wanted to see if rdf-tabular had more advanced features than csv2rdf. For example I was interested to see what I need to do to get foreign keys to work. This is the csv file
Id,Name,DoB,Sex,mother
1,Linus,02-07-2016,male,4
2,Oliver,02-07-2016,male,4
3,Anaïs,10-09-2014,female,4
4,Gordana,30-05-1982,female,
this is the metadata file
{
"@context": [ "http://www.w3.org/ns/csvw", { "@language": "en"} ],
"dc:title": "example people data",
"tableSchema": {
"@id" : "http://example.com/",
"columns": [
{
"name": "Id"
}, {
"name": "Name",
"datatype": "string"
}, {
"name": "DoB",
"datatype": {
"base": "date",
"format": "dd-MM-yyyy"
}
}, {
"name": "Sex",
"datatype": "string"
}, {
"name": "mother"
} ],
"primaryKey":"Id",
"foreignKeys": [{
"columnReference": "mother",
"reference": {
"schemaReference": "http://example.com/",
"columnReference": "Id"
}
}]
}
}
Typically, the metadata is looked for in the same place as the CSV (or visa-versa). For example, if you were to clone the repo and install rdf-tabular, rdf-turtle, and rdf gems, you can do the following:
rdf serialize --input-format tabular --output-format ttl etc/doap.csv
You should also be able to do essentially the same thing on the distiller, using the text forms for both the CSV and Metadata.
Note that your metadata will need a url referencing the CSV (a requirement I never was really on board with), and you may have issues with your foreign keys referencing a non-specified table. Start of simple, and add features as you go. The CSVW Repo has a bunch of examples.
yes, I am trying to put together the simplest foreign key example: the one where the key is in the same table schema as the table itself. I could not find that described anywhere in the w3c docs. I did I think find an example in the github repo, but I did not seem to get any interesting result with csv2rdf. So I am not sure...
My use case is also to think of csvw as a schema language which could be reused on an open-ended number of matching CSV files. So for that it helps to be able to pass the metadata file from the command line, not just the http header. In any case I think for command line exploration of the tool being able to pass it as an argument is really useful. It would have been too difficult to get going without that. (I opened a similar issue on the python implementation)
After placing the metadata file with the name pplex.csv-metadata.json in the same directory as the csv file i was able to produce the RDF with
rdf serialize --input-format tabular --output-format ttl pplex.csv
What I was hoping to get using the foreignkeys construct was to see if it would align the blank nodes without specifying any URL.
But I guess foreignkeys only really work with the valueUrl and aboutUrl fields set.
I am trying to put together a demonstration where I can show that just by setting <#Id> to be an owl:InverseFunctional property I can avoid having to specify URLs for subjects or values. Ie I was trying to get the following output from the table above:
[] <#Name> "Anaïs" ;
<#DoB> "2014-09-10"^^<http://www.w3.org/2001/XMLSchema#date> ;
<#Id> 3 ;
<#Sex> "female" ;
<#mother> [ <#Id> 4 ] .
But I don't think that is possible even with virtual columns.
That's probably worth filing in w3c/csvw for some hypothetical future group to take up, and to get more visibility from those who watch it.
I write something up here: https://github.com/w3c/csvw/issues/885