pycsw icon indicating copy to clipboard operation
pycsw copied to clipboard

issues with parsing DC rdf/xml records

Open pvgenuchten opened this issue 1 year ago • 1 comments

I noticed some unexpected parsing issues when parsing DC rdf/xml files

  • dc:identifer is not always populated; which leads to a db error indicating identifier is non nillable; suggestion here is to use the identifier of the record rdf:about='{id}' as alternate identifier
  • DC and DCTERMS have some duplicate terms (title, description, language format, date), which in the ontology are replaceable, i wonder if the parser would pick it up, my impression is not
  • in many cases dc:description is used in stead of dct:abstract
  • contacts property on record is not populated for creator, publisher, ...
  • links property on record is not populated from relation, source, ...

pvgenuchten avatar Jul 11 '24 13:07 pvgenuchten