clarin-dspace icon indicating copy to clipboard operation
clarin-dspace copied to clipboard

export new DataCite/CCMM metadata over OAI-PMH

Open kosarko opened this issue 11 months ago • 4 comments

for NMD (national thing) and OpenAIRE (if they've updated the guidelines)

The current version of DataCite metadata schema, currently being used in the Charles University repo.

And verify minting of DOIs works (for example dois and versioning)

kosarko avatar Jan 13 '25 11:01 kosarko

datacite is now optional will need to do CCMM https://github.com/techlib/CCMM

kosarko avatar Jun 11 '25 09:06 kosarko

This is in the context of NRP so assume the submission process is "datacite like" (https://github.com/ufal/clarin-dspace/blob/7eb7fc883c3beb42201a4cbc9d3a54bdeb363862/dspace/config/submission-forms.xml)

kosarko avatar Jul 25 '25 12:07 kosarko

@kosarko That means we need to:

  1. implement this crosswalk into DSpace https://techlib.github.io/CCMM/dataset/schema.xsd
  2. add a new datacite forms and input fields into submission-forms.xml
  3. add a new schema datacite (https://github.com/ufal/clarin-dspace/blob/7eb7fc883c3beb42201a4cbc9d3a54bdeb363862/dspace/config/registries/datacite.xml)

Is that it? Or am I missing something?

milanmajchrak avatar Aug 21 '25 13:08 milanmajchrak

@milanmajchrak

implement this crosswalk into DSpace https://techlib.github.io/CCMM/dataset/schema.xsd I that should be the one, and according to https://github.com/techlib/CCMM/issues/61 the metadataPrefix should be called ccmm-xml

  1. that's a bit tricky - I'm assuming that the submission-forms.xml I've linked will be the default submission setup in new repository instances started on the NRP platform and the crosswalk should definitely work with those. I don't particularly like the idea of us maintaining a "linguistic" fork and an "NRP" fork but at the same the the submission-forms.xml is getting painfully big. Is there a mechanism to use a different form config file. (if there's submission-forms.xml and submissions-forms-2.xml can I somehow per instance configure use submission-forms-2.xml or is the submission-forms.xml hardcoded? Or maybe use a xml processor capable of processing xinclude a split submission-forms into multiple files?). If using the docker compose setup, you can do a bind mount, but still need to somehow manage the distribution of the files...
  2. I guess I don't mind adding the schema to every instance, do you?

Can the crosswalk be written in such a way that it produces a valid xml even when run on lindat data (or at least some reasonable subset)? If not, then what metadata are missing in the current lindat submission, that we'd need to add?

kosarko avatar Aug 29 '25 13:08 kosarko

@milanmajchrak there'll be a small update to the ccmm schema (coming soon); but I guess at the moment we can only watch the github changes https://github.com/techlib/CCMM/compare/aaf1060...1.1.0

kosarko avatar Oct 08 '25 08:10 kosarko