framework
framework copied to clipboard
Convert data into RDF
Overview
A Frictionless Data Table Schema could be used to convert the described data from CSV files into RDF. Such programs exist for the CSV on the Web (CSVW) standard. This page sketches the motivation and the process: https://csvw.org/guides/why-use-csvw.html I think it should also be doable with Frictionless Data.
If the Table Schema contains rdfType properties for all columns it is easy to generate the RDF data. Otherwise we have to generate a property name. One idea: use the location of the Table Schema and combine it with the column's name. The type tells us, what RDF datatype to use for the literal.
Each row of the CSV file could become a blank note. However, it would be better if we could assign a URI to it. If the Table Schema contains a primary key it could be used to construct a URI for the row. Either by using the location of the Table Schema or by specifying an URI prefix.
Example
- Data Resource: https://opendata.zitsh.de/frictionless/schulen-aktuell.resource.yml
- Table Schema (linked in the Data Resource): https://opendata.zitsh.de/frictionless/schulen-aktuell.schema.json
- CSV file (linked in the Data Resource): https://opendata.schleswig-holstein.de/collection/schulen/aktuell.csv
https://zufish.schleswig-holstein.de/portaldeeplink?tsa_oe_id= is somehow specified as the base for the primary keys.
Here is an idea how the RDF output of the first line could look like:
<https://zufish.schleswig-holstein.de/portaldeeplink?tsa_oe_id=9099793>
<https://schema.org/name> "Goethe-Gemeinschaftsschule, Gemeinschaftsschule der Landeshauptstadt Kiel in Kiel" ;
<https://schema.org/addressLocality> "Kiel" ;
<https://schema.org/postalCode> 24118 ;
<https://opendata.schleswig-holstein.de/dataset/schulen#street> "Westring" ;
<https://opendata.schleswig-holstein.de/dataset/schulen#houseNumber> "358" ;
<https://schema.org/telephone> "+49 431 2604285" ;
<https://schema.org/faxNumber> "+49 431 26042869" ;
<https://schema.org/email> "[email protected]" ;
<https://schema.org/url> "http://www.ggs-kiel.de" ;
<https://schema.org/longitude> 10.125675992206988 ;
<https://schema.org/latitude> 54.33448298508547 ;
<https://opendata.schleswig-holstein.de/dataset/schulen#competences> "8965622,9019836" ;
CSVW output
The output generated by csv2rdf would look like this:
csvw:table [
a csvw:Table ;
csvw:row [
a csvw:Row ;
csvw:describes [
ns0:id 9099793 ;
ns0:name "Goethe-Gemeinschaftsschule, Gemeinschaftsschule der Landeshauptstadt Kiel in Kiel" ;
ns0:city "Kiel" ;
ns0:zipcode 24118 ;
ns0:street "Westring" ;
ns0:houseNumber "358" ;
ns0:fax "+49 431 2604285" ;
ns0:email "+49 431 2604286" ;
ns0:website "[email protected]" ;
ns0:longitude 10.125675992206988 ;
ns0:latitude 54.33448298508547 ;
ns0:competences "8965622,9019836" ;
] ;
csvw:rownum 1 ;
csvw:url ns0:row=2
],
It uses a lot of blank nodes. It may be easier to generate but is not as useful - at least in my opinion.
More ideas:
- multiple Data Resources within a Data Package could be linked into one RDF document using their
foreignKeysproperties - specify the
rdf:typeassigned to the item generated for each row
Thank you! @jze . We will review it.