metadatamanagement
metadatamanagement copied to clipboard
PID registration for variables
Has to be split up in different issues
- [x] mapping of metadata to schema of PID provider (see below)
- [ ] prepare MDM for PID import. Add new PID attribute to variable detail page
- [ ] extract relevant metadata. Then follow one of the two options
- [ ] send metadata via API to PID provider
- [x] send metadata via .json export to PID provider
- [ ] import PIDs and assignment to variables
- [ ] documentation of new process in MDM documentation
- [ ] documentation in fdz process documentation
API for registration: https://labs.da-ra.de/nfdi/ YouTube-Tutorial for the API: https://youtu.be/fm8T-hlhsXg
Main Report 2022: https://zenodo.org/records/6397367
PID Registration Service Demo Application – how to use the test system Video/Audio: https://youtu.be/fm8T-hlhsXg (URL)
KonsortSWD Measure 5.1: use cases description extended report - 2023 https://zenodo.org/records/7588944
Software KonsortSWD PID Registration Service - 2023 https://zenodo.org/records/10277054
Software Documentation KonsortSWD Measure 5.1: extended report on the architecture of the PID registration service - 2023 https://zenodo.org/records/10401641
Documentation of the DIW implementation: https://git.soep.de/kwenzig/pids/-/wikis/home
PID4NFDI: https://base4nfdi.de/projects/pid4nfdi https://os.helmholtz.de/aktuelles/projekte/pid4nfdi/
source: https://zenodo.org/records/6397367
ToDo
- [ ] Login?
- [ ] Zuordnung finalisieren
- [ ] Was bedeutet es, dass der Prozess der Registrierung länger dauert? Gibt es eine Callback-Funktion?
- [ ] Syntax PID
Mapping:
Feld | MDM Zuordnung |
---|---|
studyDOI | |
variableName | Variable::name |
variableLabel | Variable::label |
pidProposal | <syntax?> |
landingPage | <MDM Variable URL> |
resourceType | Download? |
title | |
creators | |
publisher | |
publicationDate | |
availability | |
description |
Mapping:
Feld | MDM Zuordnung | Beispiel |
---|---|---|
studyDOI | doi data package | 10.21249/DZHW:nac2018:2.0.0 |
variableName | [Variable::name] | adbi01 |
variableLabel | [Variable::label] | Status of the doctorate |
pidProposal | 21.T11998/dzwh:[dapid]_[varname]:[dapid_version] | 21.T11998/dzwh:nac2018_adbi01:2.0.0 |
landingPage | URL Variablendetailseite + Versionsparameter | https://metadata.fdz.dzhw.eu/en/variables/var-nac2018-ds1-adbi01?page=1&size=10&type=questions&version=2.0.0 |
resourceType | variable | variable |
title | [Variable::name]:[Variable::label] | adbi01:Status of the doctorate |
creators | list: projectContributors | Adrian, Dominik; Ambrasat, Jens; Briedis, Kolja; Friedrich, Christian; Fuchs, Amrei; Geils, Matthias; Kovalova, Iryna; Lange, Janine; Lietz, Almuth; Martens, Bernd; Redeke, Susanne; Ruß, Uwe; Sarcletti, Andreas; Schwabe, Ulrike; Seifert, Moritz; Siegel, Madeleine; Teichmann, Carola; Tesch, Jakob; de Vogel, Susanne; Wegner, Antje; Mühleck, Kai; Scheller, Percy; Berroth, Lara; Jänsch, Vanessa K. |
publisher | FDZ-DZHW | FDZ-DZHW |
publicationDate | publicationdate current version datapackage | Apr 30, 2024 |
availability | acessways / subdatasets of the dataset(s) the variable belongs to | SUF: On-Site, SUF: Remote-Desktop, SUF: Download, CUF: Download |
We should always use the latest version of the respective variable first and not create PID for all versions.
@AndyDaniel1 There are two further questions:
- The variable label is an i18n-string on MDM side. Should the label language for PID registration be chosen based on the selection for the language of the landing page we implemented for DA|RA? The same issue arises with the landing page for PID registration.
- The ""availability" field needs to be filled with a fixed value. Only one from the list of [Download, Delivery, OnSite, NotAvailable, Unknown] can be selected. As some data packages have more than one accessway it needs to be decided how this field should be filled for the various combinations of accessways.
Current state of development:
- variable metadata can be exported as JSON file via the "external services" entry in the administration menu
- export will take approximately 15 minutes for ~18.500 variables of 28 currently released projects (@AndyDaniel1 we should review if this includes all the projects you want to register variables for)
API research results
- Verify: - a set of exported variables was verified using the verify endpoint - verification will result in a list of errors if there are any - result without errors:
# without errors
{
"documentUri": "10.17889/DZHW:gra2005:1.0.0",
"constraintViolation": []
}
# with errors
{
"documentUri": "10.17889/DZHW:gra2005:1.0.0",
"constraintViolation": [
{
"message": "$.variables[0].availability: does not have a value in the enumeration [Download, Delivery, OnSite, NotAvailable, Unknown]",
"locationInfo": "$.variables[0].availability"
}
]
}
- Register
- Authentification was successful
- registering two test variables was successful
- successful registrations result in a
jobId
to query further information on the status of the job - Job status - successfully queried the test job resulting in:
[
{
"position": 0,
"validationStatus": "valid",
"validationErrors": [],
"pid": "21.T11998/dzhw:gra2005-1.0.0_bocc06o_v1_test:1.0.0",
"registrationResult": "created:201",
"status": "FINISHED",
"lastUpdate": "2024-08-02 10:08:40"
}
]
Jobs can also be viewed online: Status
A complete PID link will look like this: https://doi.org/21.T11998/dzhw:gra2005-1.0.0_bocc06o_v1_test:1.0.0
@AndyDaniel1 There are two further questions:
1. The variable label is an i18n-string on MDM side. Should the label language for PID registration be chosen based on the selection for the language of the landing page we implemented for DA|RA? The same issue arises with the landing page for PID registration.
For the PID the english version of the landing page should be chosen.
2. The ""availability" field needs to be filled with a fixed value. Only one from the list of [Download, Delivery, OnSite, NotAvailable, Unknown] can be selected. As some data packages have more than one accessway it needs to be decided how this field should be filled for the various combinations of accessways.
If a accessway is defined (cuf, download, remote, onsite) the type should be "delivery"
if the type is not available it should be set to "not available"
We don't have the type unknown but if we had it should be set to "unknown".
@AndyDaniel1 regarding the field variableLabel
we need to decide how to deal with missing translations: the current implementation only exports variables if there's an english variable label available. However there are quite a lot of datasets that only include variables without a proper translation for their respective labels.
So either: a) we implement a fallback that uses the German label or b) we don't register the variable
Regarding the release workflow we also need to discuss how to perform the registration asynchronously: right now the entire export process takes too long and would prevent the release workflow from completing right away. Thus it needs to happen in the background and we need to inform the user that a release can take a couple of minutes until it is registered with da|ra.