bib-rdf-pipeline
bib-rdf-pipeline copied to clipboard
Scripts and configuration for converting MARC bibliographic records into RDF
The MARC records sometimes have uncertain or inferred years, e.g. `1984?` or `[1850]` or other special values such as year ranges. These are not valid values for the `schema:datePublished` value....
Our MARC records have structured page counts, e.g. `vii, 89, 31 s.`. However, Schema.org only defines a single integer field `schema:numberOfPages` so the structured values are not really valid Schema.org....
As demonstrated by the [latest Travis build](https://travis-ci.org/NatLibFi/bib-rdf-pipeline/builds/302363700), newer Jena versions are stricter with URI parsing and thus the `riot` command used for converting from marc2bibframe2 output (RDF/XML) to N-Triples fails....
E.g. the example/test record ekumeeninen-00585 has the manufacturer "Saarijärven Offset" but this is not expressed in the current Schema.org output.
This came up during #18. Currently we use person URIs, but the `schema:name` values are based on information from the bibliographic records (as that's all we have - the person...
Currently series membership (mainly from 830 fields) does not make use of volume number information. We should model the periodicals in more detail in the Schema.org output, probably using [PublicationVolume](http://schema.org/PublicationVolume)....
As suggested by @VladimirAlexiev, we could use [RDFUnit](http://aksw.org/Projects/RDFUnit.html) in the unit test suite. An ideal use would be checking the schema.org output to make sure it matches schema.org conventions.