pyDataverse icon indicating copy to clipboard operation
pyDataverse copied to clipboard

Mapping DDI XML

Open skasberger opened this issue 6 years ago • 3 comments

Implement mapping from and to DDI XML.

Requirements

  • DDI XML from NESSTAR mapping
  • DDI XML from OAI-PMH endpoint mapping
  • DDI XML from frontend download mapping
  • import from all DDI XML versions
  • import from all DDI XML versions (lower priority)
  • validate data against schema
  • XML schemas

ACTIONS

0. Pre-Requisites

  • [ ] Part of re-factor models module #102
  • [ ] is there already a Python module out there, which works with DDI XML (or code, or developer)?
  • [ ] Check out DANS service

1. Research

  • [ ]

2. Plan

  • [ ] Define requirements

3. Implement

  • [ ] Write tests
    • [ ] Create Mapping File
  • [ ] Write code
  • [ ] Update Docs
    • [ ] Basic Usage
    • [ ] Advanced Usage
    • [ ] Quickstart
  • [ ] Write Docstrings
  • [ ] Run pytest
  • [ ] Run tox
  • [ ] Run pylint
  • [ ] Run mypy

4. Follow Ups

  • [ ] Review
    • [ ] Code
    • [ ] Tests
    • [ ] Docs

Follow-Ups

  • [ ] Re-factor models module #102

skasberger avatar Jun 22 '19 18:06 skasberger

Here some resources:

Functionalities:

  • Data types: Datasets, Datafiles
  • Validate data before export and import
  • Import to DVObjects Dataset and Datafile
  • Export to DVObjects Dataset and Datafile
  • Include NESSTAR Dialect
  • add custom mapping with custom XSLT
  • Use/Create XSLT Schemas: CESSDA has some, DANS ddi-converter uses one too and another one on GitHub.

Mapping for DDI 2.5. Some attributes mapped:

abstract -> dsDescriptionValue
notes -> notesText
titl -> titel
subTitl -> subtitle
altTitl -> alternativeTitle
grantNo -> grantNumberValue
grantNo (agency) -> grantNumberAgency
timePrd (event="start") -> timePeriodCoveredStart
timePrd (event="end") -> timePeriodCoveredEnd
collDate (event="start") -> dateOfCollectionStart
collDate (event="end") -> dateOfCollectionEnd
dataKind -> kindOfData
serName -> seriesName
serInfo -> seriesInformation
relMat -> relatedMaterial
relStdy -> relatedDatasets
othRefs -> otherReferences
srcOrig -> originOfSources

skasberger avatar Jun 26 '20 01:06 skasberger

Notes to myself:

  • https://wiki.selfhtml.org/wiki/XML/XSL/XSLT
  • https://de.wikipedia.org/wiki/XSL_Transformation
  • https://www.torsten-horn.de/techdocs/java-xsd.htm
  • https://www.w3schools.com/xml/schema_intro.asp
  • https://de.wikipedia.org/wiki/Dokumenttypdefinition
  • https://de.wikipedia.org/wiki/XPath
  • Videos Harvard Kurs
    • https://www.youtube.com/watch?v=x8kMELlNaYg&list=WL&index=67&t=2s
    • https://www.youtube.com/watch?v=-Wft5dD-1ig&list=WL&index=68&t=14s
    • https://www.youtube.com/watch?v=YkAZlQgPXG4&list=WL&index=69&t=30s
    • https://www.youtube.com/watch?v=hOR132CodOU&list=WL&index=70&t=31s
    • https://www.youtube.com/watch?v=qY2Ezw786ko&list=WL&index=71&t=12s
    • https://www.youtube.com/watch?v=6Zvw3kmJ0KA&list=WL&index=72&t=0s
  • https://de.wikipedia.org/wiki/Extensible_Stylesheet_Language

skasberger avatar Jun 27 '20 00:06 skasberger

As discussed during the 2024-02-14 meeting of the pyDataverse working group, we are closing old milestones in favor of a new project board at https://github.com/orgs/gdcc/projects/1 and removing issues (like this one) from those old milestones. Please feel free to join the working group! You can find us at https://py.gdcc.io and https://dataverse.zulipchat.com/#narrow/stream/377090-python

pdurbin avatar Mar 04 '24 16:03 pdurbin