pyDataverse icon indicating copy to clipboard operation
pyDataverse copied to clipboard

OAI-PMH integration

Open skasberger opened this issue 4 years ago • 1 comments

Integrate OAI-PMH endpoint and data conversion.

Requirements

  • Mapping of data from OAI-PMH endpoint (DDI XML and/or DC)
  • Import of data
  • Export of data
  • XML schema
  • validate against schema

ACTIONS

0. Pre-Requisites

1. Research

  • [ ] check Python modules
  • [ ] Get in touch with Carsten Thiel about progress of tool to validate Metadata Server output and creation of CESSDA metadata model from it

pyoai

from oaipmh.client import Client
from oaipmh.metadata import MetadataRegistry, oai_dc_reader
url = "https://data.aussda.at/oai"
registry = MetadataRegistry()
registry.registerReader('oai_dc', oai_dc_reader)
client = Client(URL, registry)

for record in client.listRecords(metadataPrefix='oai_dc'):
  print(record)

oai-harvest

oai-harvest --set "all_published" --metadataPrefix "oai_ddi" https://data.aussda.at/oai

sickle

from sickle import Sickle
sickle = Sickle('https://data.aussda.at/oai')
records = sickle.ListRecords(metadataPrefix='oai_ddi')
record = records.next()
record.header
record.header.identifier
record.metadata

2. Plan

  • [ ] Define requirements

3. Implement

  • [ ] Write tests
  • [ ] Write code
  • [ ] Write and update Docs
  • [ ] Write Docstrings
  • [ ] Run pytest
  • [ ] Run tox
  • [ ] Run pylint
  • [ ] Run mypy

4. Follow Ups

  • [ ] Review
    • [ ] Code
    • [ ] Tests
    • [ ] Docs

skasberger avatar Apr 02 '21 21:04 skasberger

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

pdurbin avatar Mar 04 '24 16:03 pdurbin