PID reconcile command
What this PR does / why we need it:
Please refer to #10501
Which issue(s) this PR closes:
Closes #10501
Special notes for your reviewer/ Suggestions on how to test this:**: Configure two PID Provider via docker-compose-file Alter this line:
-Ddataverse.pid.providers=fake,perma
add those lines to the dataverse env:
-Ddataverse.pid.perma.type=perma
-Ddataverse.pid.perma.label=PermaProvider
-Ddataverse.pid.perma.permalink.base-url=https://example.org/
-Ddataverse.pid.perma.permalink.separator=-
-Ddataverse.pid.perma.authority=identifier
Start up dataverse. Create a new dataset with a datafile. Then change the pidProvider of the dataset.
curl -X PUT --location "http://localhost:8080/api/v1/datasets/{id}/pidReconcile/perma" \
-H "X-Dataverse-key: $KEY" -d 'perma'
Then reconcile the pid via API call:
curl -X PUT --location "http://localhost:8080/api/v1/datasets/{id}/pidReconcile" \
-H "X-Dataverse-key: $KEY"
Verify that the PID changes in the UI and a notification apprears.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No. Is there a release notes update needed for this change?: Yes, will be added later.
coverage: 22.831% (+0.08%) from 22.751% when pulling ee4fdeef0d6193867d33daa337c6fbae7242f822 on johannes-darms:feat/reconcilepid into 6be4f20f216d1f3d84cdf44fa98d615a59c5bfe2 on IQSS:develop.
@johannes-darms - I made several related comments about avoiding mixing the effective Pid generator and PidProvider for the existing PID concepts. If that's not clear, let's have a call or slack, etc.
I made several related comments about avoiding mixing the effective Pid generator and PidProvider for the existing PID concepts. If that's not clear, let's have a call or slack, etc.
Thanks for the explanation I wasn't aware of the distinction. I'll update the code accordingly.
@qqmyers : I've adapted the PR according to your feedback. Could you have another look?
@qqmyers it took a while, but I finally managed to implement all your change requests. Would you like to look over it again?
@johannes-darms heads up that there are merge conflicts.
@pdurbin @qqmyers I've resolved the merge conflicts and addressed the open review comment. Please feel free to re-review this PR. :+1:
However, I am not sure why the Jenkins check is currently failing. If there's an error for me to fix, could you let me know what it is?
@vera - I'm seeing the following in the log. If it doesn't make sense, let me know and I can dig for more info:
[INFO] Running edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest
Feb 05, 2025 4:09:14 PM edu.harvard.iq.dataverse.DataCitation getDateFrom
WARNING: Unable to find citation date for datasetversion: null
Feb 05, 2025 4:09:14 PM edu.harvard.iq.dataverse.DataCitation getDateFrom
WARNING: Unable to find citation date for datasetversion: null
[ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.132 s <<< FAILURE! -- in edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest
[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations -- Time elapsed: 0.039 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
at edu.harvard.iq.dataverse.pidproviders.doi.datacite.DOIDataCiteRegisterService.getMetadataFromDvObject(DOIDataCiteRegisterService.java:169)
at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations(XmlMetadataTemplateTest.java:313)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreation -- Time elapsed: 0.002 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreation(XmlMetadataTemplateTest.java:177)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields -- Time elapsed: 0.014 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
at edu.harvard.iq.dataverse.pidproviders.doi.datacite.DOIDataCiteRegisterService.getMetadataFromDvObject(DOIDataCiteRegisterService.java:169)
at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields(XmlMetadataTemplateTest.java:248)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Thanks! It seems there was some interaction between the new tests and some old tests, because the new tests didn't remove the PID providers they created. I've pushed a commit cleaning up the tests so they should now pass.
By the way, since the test steps for testing the API in the initial comment are outdated, here are some updated steps:
-
Start up Dataverse with at least two configured PID providers (e.g. by editing
docker-compose-dev.yamlas described in the initial comment) -
Create collection configured for one of the PID providers, e.g. Permalinks
-
Create dataset
-
Use API to reconcile PID of that dataset (expected: no effect)
$ curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_KEY"{"status":"ERROR","message":"PID Provider perma is same as configured. This Operation has no effect!"} -
Use API to set PID generator of that dataset (to another PID provider than the collection, e.g. fake DOIs)
curl -X PUT -H "X-Dataverse-key: $API_TOKEN" -H 'Content-type: application/json' -d fake "http://localhost:8080/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_ID" -
Use API to reconcile PID of that dataset (expected: dataset receives new PID)
curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_TOKEN"{"status":"OK","data":{"message":"doi:10.5072/FK2/CR3K6U"}} -
Repeat steps 3-6, but publish dataset before trying to reconcile PID (expected: no effect)
$ curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_KEY"{"status":"ERROR","message":"Dataset already published, cannot alter PID Provider"}
@vera = FWIW: I think I found the cause of the most recent test fails - see #11225.
For the record, this run yesterday was successful: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10567/12/testReport/ . All API tests passed in about 17 minutes.
Ok, yes, in the latest run at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10567/13/console I'm seeing this:
[ERROR] Errors:
[ERROR] XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields:248 » NullPointer Cannot invoke "String.equals(Object)" because "name" is null
[ERROR] XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations:313 » NullPointer Cannot invoke "String.equals(Object)" because "name" is null
So yeah, same territory as this PR:
- #11225
Overall it seems like a nice feature! I'm curious... what's the real world use case? I assume it's something like "We give all of our datasets Permalinks by default but along the way we decide that some datasets deserve an actual DOI. We want to move these datasets to a collection where DOIs are configured. Once moved, we give them a DOI (and keep the old Permalink around as an alternative PID)." Is that close?
We offer different PID systems for our users, but asking them to choose one upfront often leads to confusion and may result in the data capture or publication process being abandoned. To address this, we have moved the selection to the final step before publication. (See screenshot below) At this stage, users can choose the appropriate PID system, after which the system moves the dataset into the corresponding collection, updates the PID, and initiates the publication request.
Initially, I considered this as a reconciliation operation that should be part of the move process to ensure dataset consistency with the collection configuration. However, after discussing it with Jim, I revised the approach. Now, the PR focuses solely on reconciliation without moving datasets.
Another use case would be test or demo instances that start with a fake DOI or permalink and later need to transition to a proper PID system. Currently, this is not possible because certain safeguards prevent the reconciliation of already published records. However, this functionality could be easily implemented if needed.
@johannes-darms phew! 6.6 is out! It looks like @vera merged the latest from develop (thanks!) but some of the questions from my last review are unanswered. Can you please take a look? Thanks!
Looks good, merging.