dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Fix DDI Import for otherId

Open luddaniel opened this issue 1 year ago • 2 comments

What this PR does / why we need it:

This PR fixes issue while harvesting DDI with multiple otherId This fix mirrors citation.tsv otherId configuration : allowmultiples = TRUE https://github.com/IQSS/dataverse/blob/7d4d534338161b5f0f5a1ce0079304f8ec3b7a80/scripts/api/data/metadatablocks/citation.tsv#L8

luddaniel avatar Aug 14 '24 09:08 luddaniel

Coverage Status

coverage: 21.224%. remained the same when pulling 483e99fcbf326586c35b6fd024b7f88a287cb064 on Recherche-Data-Gouv:fix_importDDI_otherId into 6a00ce51cc5072695411c8100238a3165506ba70 on IQSS:develop.

coveralls avatar Aug 14 '24 09:08 coveralls

@luddaniel Can you please provide additional details on what this PR is doing and how we can test it.

ofahimIQSS avatar Oct 25 '24 16:10 ofahimIQSS

@ofahimIQSS This PR fixes the edu.harvard.iq.dataverse.util.json.JsonParseException: incorrect multiple for field otherId error when DDI harvested data contains multiple ortherId. This can be reproduced by harvesting the following repo: https://data.progedo.fr/oai

{
  "nickName": "progedo",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "default",
  "harvestUrl": "https://data.progedo.fr/oai",
  "archiveUrl": "https://data.progedo.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "oai_ddi25",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}

curl -H "X-Dataverse-key: 600e69df-f046-490a-824e-33f3430b9476" -H "Content-Type: application/json" -X POST "http://localhost:8080/api/harvest/clients/progedo" --upload-file "client.json"

This cannot be reproduced by Importing a Dataset into a Dataverse Installation with a DDI file as this code line is properly coded : https://github.com/IQSS/dataverse/blob/7d4d534338161b5f0f5a1ce0079304f8ec3b7a80/src/main/java/edu/harvard/iq/dataverse/api/imports/ImportDDIServiceBean.java#L1439

@pdurbin @landreev Is there a way to not waste too much time for this type of PR? Finding the problem and making a PR took me 1 hour whereas providing a detailed explanation, material to test took me 4 hours of headache.

luddaniel avatar Oct 30 '24 16:10 luddaniel

@luddaniel sorry about your headache. I'm not sure what would help. Do you have any suggestions?

pdurbin avatar Oct 30 '24 21:10 pdurbin

@luddaniel Thank you for providing the testing details.

After uploading the client.json file with the same data provided, I ran the harvest job and received Success with "1660 failed" in results. I am testing in my local environment. Harvest Log/server log can be found below:

image harvest_cleanup_progedo_2024-11-04T18-18-49.txt harvest_progedo_2024-11-04T18-18-49.log server.log

ofahimIQSS avatar Nov 04 '24 19:11 ofahimIQSS

@ofahimIQSS My bad, to fix Specified metadatalanguage not allowed. I forgot to give you this instruction :

curl http://localhost:8080/api/admin/settings/:MetadataLanguages -X PUT -d '[{"locale":"en","title":"English"},{"locale":"fr","title":"Français"}]'

This should give you SUCCESS; 1072 harvested, 0 deleted, 588 failed. Without this PR you should see edu.harvard.iq.dataverse.util.json.JsonParseException: incorrect multiple for field otherId using develop branch for example.

Updated with develop as it contained an harvesting regression fixed by #10990

luddaniel avatar Nov 05 '24 14:11 luddaniel

@luddaniel Thanks again - that did the trick. Testing Complete - Merging PR

Testing of 10772.docx

ofahimIQSS avatar Nov 05 '24 16:11 ofahimIQSS