pycsw icon indicating copy to clipboard operation
pycsw copied to clipboard

issues with parsing DC rdf/xml records

Open pvgenuchten opened this issue 1 year ago • 1 comments

I noticed some unexpected parsing issues when parsing DC rdf/xml files

  • dc:identifer is not always populated; which leads to a db error indicating identifier is non nillable; suggestion here is to use the identifier of the record rdf:about='{id}' as alternate identifier
  • DC and DCTERMS have some duplicate terms (title, description, language format, date), which in the ontology are replaceable, i wonder if the parser would pick it up, my impression is not
  • in many cases dc:description is used in stead of dct:abstract
  • contacts property on record is not populated for creator, publisher, ...
  • links property on record is not populated from relation, source, ...

pvgenuchten avatar Jul 11 '24 13:07 pvgenuchten

@pvgenuchten can you provide some sample/test case metadata so I can reproduce?

tomkralidis avatar Aug 04 '24 19:08 tomkralidis

  • https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=e97d29a96d0e4943d757efa95101a2904826065d947740f1fb05fe6e2a0930d5
  • https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=53b3946ddb431037aa99e4fccd86fe280a18b25e8ff9612e0f963ba0f2691e4e
  • https://api.ka3.uni-koeln.de/oai/lac?identifier=hdl:11341/0000-0000-0000-35E2&verb=GetRecord&metadataPrefix=oai_dc
  • https://textgridlab.org/1.0/tgoaipmh/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=textgrid:jgtr.0

pvgenuchten avatar Mar 04 '25 10:03 pvgenuchten

This Issue has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] avatar Sep 21 '25 03:09 github-actions[bot]

This Issue has been closed due to there being no activity for more than 90 days.

github-actions[bot] avatar Oct 05 '25 03:10 github-actions[bot]