framework icon indicating copy to clipboard operation
framework copied to clipboard

Convert data into RDF

Open jze opened this issue 3 years ago • 1 comments

Overview

A Frictionless Data Table Schema could be used to convert the described data from CSV files into RDF. Such programs exist for the CSV on the Web (CSVW) standard. This page sketches the motivation and the process: https://csvw.org/guides/why-use-csvw.html I think it should also be doable with Frictionless Data.

If the Table Schema contains rdfType properties for all columns it is easy to generate the RDF data. Otherwise we have to generate a property name. One idea: use the location of the Table Schema and combine it with the column's name. The type tells us, what RDF datatype to use for the literal.

Each row of the CSV file could become a blank note. However, it would be better if we could assign a URI to it. If the Table Schema contains a primary key it could be used to construct a URI for the row. Either by using the location of the Table Schema or by specifying an URI prefix.

Example

  • Data Resource: https://opendata.zitsh.de/frictionless/schulen-aktuell.resource.yml
  • Table Schema (linked in the Data Resource): https://opendata.zitsh.de/frictionless/schulen-aktuell.schema.json
  • CSV file (linked in the Data Resource): https://opendata.schleswig-holstein.de/collection/schulen/aktuell.csv

https://zufish.schleswig-holstein.de/portaldeeplink?tsa_oe_id= is somehow specified as the base for the primary keys.

Here is an idea how the RDF output of the first line could look like:

<https://zufish.schleswig-holstein.de/portaldeeplink?tsa_oe_id=9099793>
	<https://schema.org/name> "Goethe-Gemeinschaftsschule, Gemeinschaftsschule der Landeshauptstadt Kiel in Kiel" ;
	<https://schema.org/addressLocality> "Kiel" ;
	<https://schema.org/postalCode>	24118 ;
	<https://opendata.schleswig-holstein.de/dataset/schulen#street>	"Westring" ;
	<https://opendata.schleswig-holstein.de/dataset/schulen#houseNumber> "358" ;
	<https://schema.org/telephone> "+49 431 2604285" ;
	<https://schema.org/faxNumber> "+49 431 26042869" ;
	<https://schema.org/email> "[email protected]" ;
	<https://schema.org/url> "http://www.ggs-kiel.de" ;
	<https://schema.org/longitude> 10.125675992206988 ;
	<https://schema.org/latitude> 54.33448298508547 ;
	<https://opendata.schleswig-holstein.de/dataset/schulen#competences> "8965622,9019836" ;

CSVW output

The output generated by csv2rdf would look like this:

  csvw:table [
    a csvw:Table ;
    csvw:row [
      a csvw:Row ;
      csvw:describes [
        ns0:id 9099793 ;
        ns0:name "Goethe-Gemeinschaftsschule, Gemeinschaftsschule der Landeshauptstadt Kiel in Kiel" ;
        ns0:city "Kiel" ;
        ns0:zipcode 24118 ;
        ns0:street "Westring" ;
        ns0:houseNumber "358" ;
        ns0:fax "+49 431 2604285" ;
        ns0:email "+49 431 2604286" ;
        ns0:website "[email protected]" ;
        ns0:longitude 10.125675992206988 ;
        ns0:latitude 54.33448298508547 ;
        ns0:competences "8965622,9019836" ;
      ] ;
      csvw:rownum 1 ;
      csvw:url ns0:row=2
    ],

It uses a lot of blank nodes. It may be easier to generate but is not as useful - at least in my opinion.


More ideas:

  • multiple Data Resources within a Data Package could be linked into one RDF document using their foreignKeys properties
  • specify the rdf:type assigned to the item generated for each row

jze avatar Jun 06 '22 07:06 jze

Thank you! @jze . We will review it.

shashigharti avatar Jun 06 '22 07:06 shashigharti