dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

PID reconcile command

Open johannes-darms opened this issue 1 year ago • 4 comments

What this PR does / why we need it:

Please refer to #10501

Which issue(s) this PR closes:

Closes #10501

Special notes for your reviewer/ Suggestions on how to test this:**: Configure two PID Provider via docker-compose-file Alter this line:

  -Ddataverse.pid.providers=fake,perma

add those lines to the dataverse env:

      
        -Ddataverse.pid.perma.type=perma
        -Ddataverse.pid.perma.label=PermaProvider
        -Ddataverse.pid.perma.permalink.base-url=https://example.org/
        -Ddataverse.pid.perma.permalink.separator=-
        -Ddataverse.pid.perma.authority=identifier

Start up dataverse. Create a new dataset with a datafile. Then change the pidProvider of the dataset.

curl -X PUT --location "http://localhost:8080/api/v1/datasets/{id}/pidReconcile/perma" \
    -H "X-Dataverse-key: $KEY" -d 'perma'

Then reconcile the pid via API call:

curl -X PUT --location "http://localhost:8080/api/v1/datasets/{id}/pidReconcile" \
    -H "X-Dataverse-key: $KEY"

Verify that the PID changes in the UI and a notification apprears.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No. Is there a release notes update needed for this change?: Yes, will be added later.

johannes-darms avatar May 17 '24 14:05 johannes-darms

Coverage Status

coverage: 22.831% (+0.08%) from 22.751% when pulling ee4fdeef0d6193867d33daa337c6fbae7242f822 on johannes-darms:feat/reconcilepid into 6be4f20f216d1f3d84cdf44fa98d615a59c5bfe2 on IQSS:develop.

coveralls avatar May 17 '24 14:05 coveralls

@johannes-darms - I made several related comments about avoiding mixing the effective Pid generator and PidProvider for the existing PID concepts. If that's not clear, let's have a call or slack, etc.

qqmyers avatar May 21 '24 18:05 qqmyers

I made several related comments about avoiding mixing the effective Pid generator and PidProvider for the existing PID concepts. If that's not clear, let's have a call or slack, etc.

Thanks for the explanation I wasn't aware of the distinction. I'll update the code accordingly.

johannes-darms avatar May 22 '24 06:05 johannes-darms

@qqmyers : I've adapted the PR according to your feedback. Could you have another look?

johannes-darms avatar Jun 05 '24 06:06 johannes-darms

@qqmyers it took a while, but I finally managed to implement all your change requests. Would you like to look over it again?

johannes-darms avatar Aug 05 '24 14:08 johannes-darms

@johannes-darms heads up that there are merge conflicts.

pdurbin avatar Sep 10 '24 18:09 pdurbin

@pdurbin @qqmyers I've resolved the merge conflicts and addressed the open review comment. Please feel free to re-review this PR. :+1:

However, I am not sure why the Jenkins check is currently failing. If there's an error for me to fix, could you let me know what it is?

vera avatar Feb 06 '25 13:02 vera

@vera - I'm seeing the following in the log. If it doesn't make sense, let me know and I can dig for more info:

[INFO] Running edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest
Feb 05, 2025 4:09:14 PM edu.harvard.iq.dataverse.DataCitation getDateFrom
WARNING: Unable to find citation date for datasetversion: null
Feb 05, 2025 4:09:14 PM edu.harvard.iq.dataverse.DataCitation getDateFrom
WARNING: Unable to find citation date for datasetversion: null
[ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.132 s <<< FAILURE! -- in edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest
[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations -- Time elapsed: 0.039 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
	at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
	at edu.harvard.iq.dataverse.pidproviders.doi.datacite.DOIDataCiteRegisterService.getMetadataFromDvObject(DOIDataCiteRegisterService.java:169)
	at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations(XmlMetadataTemplateTest.java:313)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreation -- Time elapsed: 0.002 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
	at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
	at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreation(XmlMetadataTemplateTest.java:177)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

[ERROR] edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields -- Time elapsed: 0.014 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because "name" is null
	at edu.harvard.iq.dataverse.pidproviders.PidUtil.getPidProvider(PidUtil.java:229)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:107)
	at edu.harvard.iq.dataverse.pidproviders.doi.XmlMetadataTemplate.generateXML(XmlMetadataTemplate.java:84)
	at edu.harvard.iq.dataverse.pidproviders.doi.datacite.DOIDataCiteRegisterService.getMetadataFromDvObject(DOIDataCiteRegisterService.java:169)
	at edu.harvard.iq.dataverse.pidproviders.doi.datacite.XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields(XmlMetadataTemplateTest.java:248)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

qqmyers avatar Feb 06 '25 13:02 qqmyers

Thanks! It seems there was some interaction between the new tests and some old tests, because the new tests didn't remove the PID providers they created. I've pushed a commit cleaning up the tests so they should now pass.

By the way, since the test steps for testing the API in the initial comment are outdated, here are some updated steps:

  1. Start up Dataverse with at least two configured PID providers (e.g. by editing docker-compose-dev.yaml as described in the initial comment)

  2. Create collection configured for one of the PID providers, e.g. Permalinks

  3. Create dataset

  4. Use API to reconcile PID of that dataset (expected: no effect)

    $ curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_KEY"

    {"status":"ERROR","message":"PID Provider perma is same as configured. This Operation has no effect!"}

  5. Use API to set PID generator of that dataset (to another PID provider than the collection, e.g. fake DOIs)

    curl -X PUT -H "X-Dataverse-key: $API_TOKEN" -H 'Content-type: application/json' -d fake "http://localhost:8080/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_ID"

  6. Use API to reconcile PID of that dataset (expected: dataset receives new PID)

    curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_TOKEN"

    {"status":"OK","data":{"message":"doi:10.5072/FK2/CR3K6U"}}

  7. Repeat steps 3-6, but publish dataset before trying to reconcile PID (expected: no effect)

    $ curl -X PUT "http://localhost:8080/api/datasets/:persistentId/pidReconcile?persistentId=$PERSISTENT_ID" -H "X-Dataverse-key: $API_KEY"

    {"status":"ERROR","message":"Dataset already published, cannot alter PID Provider"}

vera avatar Feb 06 '25 15:02 vera

@vera = FWIW: I think I found the cause of the most recent test fails - see #11225.

qqmyers avatar Feb 06 '25 23:02 qqmyers

For the record, this run yesterday was successful: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10567/12/testReport/ . All API tests passed in about 17 minutes.

pdurbin avatar Feb 07 '25 14:02 pdurbin

Ok, yes, in the latest run at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10567/13/console I'm seeing this:

[ERROR] Errors: 
[ERROR]   XmlMetadataTemplateTest.testDataCiteXMLCreationAllFields:248 » NullPointer Cannot invoke "String.equals(Object)" because "name" is null
[ERROR]   XmlMetadataTemplateTest.testDataCiteXMLCreationAllFieldsMultipleGeoLocations:313 » NullPointer Cannot invoke "String.equals(Object)" because "name" is null

So yeah, same territory as this PR:

  • #11225

pdurbin avatar Feb 07 '25 14:02 pdurbin

Overall it seems like a nice feature! I'm curious... what's the real world use case? I assume it's something like "We give all of our datasets Permalinks by default but along the way we decide that some datasets deserve an actual DOI. We want to move these datasets to a collection where DOIs are configured. Once moved, we give them a DOI (and keep the old Permalink around as an alternative PID)." Is that close?

We offer different PID systems for our users, but asking them to choose one upfront often leads to confusion and may result in the data capture or publication process being abandoned. To address this, we have moved the selection to the final step before publication. (See screenshot below) At this stage, users can choose the appropriate PID system, after which the system moves the dataset into the corresponding collection, updates the PID, and initiates the publication request.

Initially, I considered this as a reconciliation operation that should be part of the move process to ensure dataset consistency with the collection configuration. However, after discussing it with Jim, I revised the approach. Now, the PR focuses solely on reconciliation without moving datasets.

Another use case would be test or demo instances that start with a fake DOI or permalink and later need to transition to a proper PID system. Currently, this is not possible because certain safeguards prevent the reconciliation of already published records. However, this functionality could be easily implemented if needed.

409592737-6bdfba2d-ec88-44e5-b2d0-3e47734955b7

johannes-darms avatar Feb 10 '25 07:02 johannes-darms

@johannes-darms phew! 6.6 is out! It looks like @vera merged the latest from develop (thanks!) but some of the questions from my last review are unanswered. Can you please take a look? Thanks!

pdurbin avatar Mar 20 '25 19:03 pdurbin

Looks good, merging.

ofahimIQSS avatar Apr 15 '25 17:04 ofahimIQSS