bib-rdf-pipeline
bib-rdf-pipeline copied to clipboard
Scripts and configuration for converting MARC bibliographic records into RDF
The COMHIS team has crafted many cleanup functions for Fennica records. They have provided CSV files with substitutions (in practice: record ID, old value, new value). These should be integrated...
The instance I00590886000 has publisher `Kopijyvä [jakaja` where the "jakaja" suffix isn't properly stripped
For example, the work W00508353600 has four instances, two of which only have the note "Julkaistu myös painettuna". These are in principle links to other instances, but with so little...
In many cases we could detect the book format (Hardcover or Paperback) based on information given in the 020 (ISBN) field, e.g. these records in the `kotona` test set: ```...
The SPARQL CONSTRUCT query to turn BF2.0 into Schema is quite slow. It is currently the slowest part of the conversion pipeline. I think there's room to optimize, for example...
I'm still not happy with the way series are modelled. Currently, it's the Work that is part of a Series (which is also a Work). I think that the Instance...
There are several supposedly RDA Carrier categories in the metadata that don't actually match the official values. See the breakdown in #15. For example, `Digitaalinen jäljenne` does not exist in...
Currently all schema.org bibliographic entities are typed as `schema:CreativeWork` and instance level entities also as `schema:Book`. The latter is wrong in some cases. We should use the correct, more specific...
E.g. http://urn.fi/URN:NBN:fi:bib:me:O00000698301 generated from http://urn.fi/URN:NBN:fi:bib:me:000006983#Agent880-32 The 880 field is special, we would really need to check subfield 6 to see how it should be interpreted, but that information is not...
Curretly subjects get attached to some works, but not all. For example, for translated works, subjects are attached to the translation work but not the original work. In the consolidate...