dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

8868 fix json import

Open sekmiller opened this issue 3 years ago • 1 comments

What this PR does / why we need it: this PR makes sure that the json export of a dataset does not include metadataLanguage if it is undefined

Which issue(s) this PR closes:

Closes #8868 dataverse_json with "metadataLanguage":"undefined" cannot be imported

Special notes for your reviewer: I moved the test for isMetadataLanguageSet to the DVObjectContainer so that it may be used in other situations where similar code was in place.

Suggestions on how to test this: make sure that a dataset exported where the metadataLanguage is undefined may be imported.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No Is there a release notes update needed for this change?: No Additional documentation: None

sekmiller avatar Aug 22 '22 16:08 sekmiller

Coverage Status

Coverage decreased (-0.003%) to 20.043% when pulling 4fdc499d2beaf07901c0646fdef03024a3df7fbf on 8868-fix-json-import into eed4775da485028d1d54bb3de4b5b1801e1e9870 on develop.

coveralls avatar Aug 22 '22 16:08 coveralls

@sekmiller @qqmyers This looked like a most straightforward PR, but I managed to find something slightly puzzling in it while testing. It definitely fixes the json export. But since the DDI export is nominally touched in the PR, I checked on that too, just in case, and I'm seeing the following: The dataset has a single Subject. In the DDI export in the develop branch:

 <keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>
 <keyword>Medicine, Health and Life Sciences</keyword>

In this branch:

 <keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>

I can't see immediately how the changes in the branch would result in this. (?) Also, which is the correct behavior? (to me the latter looks more correct) Jim, I added you because languages in metadata exports is your area.

landreev avatar Aug 25 '22 22:08 landreev

Two guesses: I think the design is for the keywords to show in the platform default lang, and, if different, the metaata lang as well. So two possibilities - one machine used for testing has a non-english language (i.e. C.utf8 can be the default instead) so it creates english and a default, or possibility 2: the metadatalang=default in the json, which is used as the starting point for the DDI export was tricking the code into adding the second entry somehow. Looking at the code, my guess would be the former, i.e. that the machine locale is not english. (I've seen this on Ubuntu, not sure what else does it by default). i.e. !defaultLocale.getLanguage().equals(the metadatalanguage for the dataset) at https://github.com/IQSS/dataverse/blob/1ef73b050eeaf42dd6ffe9ae735afedefd36fca9/src/main/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtil.java#L1487

Not sure though...

qqmyers avatar Aug 25 '22 23:08 qqmyers

Both branches were tested on the same system (my dev. box), on the same dataset (by deleting the cached export, asking for ddi again). Tested develop -> switched to branch -> switched back to develop - the behavior is consistent as described above. (will switch back to the branch one more time). While on the same branch, no difference between the ddi produced by the export API, vs. the export performed as part of publishing - that crossed my mind too.

landreev avatar Aug 25 '22 23:08 landreev

(the locale jumping between different settings between redeployments, unrelated to what's in the branch - not entirely impossible)

landreev avatar Aug 25 '22 23:08 landreev

@qqmyers It is in fact the result our DDI being produced from our json. The new behavior - just one subject/keyword entry - appears to be an improvement, correct? So, ok to merge the branch?

landreev avatar Aug 26 '22 18:08 landreev

Improvement or not, if we're going with this change, please document it in the release note snippet.

pdurbin avatar Aug 26 '22 19:08 pdurbin

@pdurbin

Improvement or not, if we're going with this change, please document it in the release note snippet.

Actually, no, I'm not sure this needs to be documented. I don't think the "old" behavior was ever documented, or intentional. To me it looks like it was only there since 5.11, and, again, as the result of that "metadataLanguage":"undefined" entry in the json; which was not added on purpose, and which this PR is reversing. But I'll wait for an authoritative answer from Jim.

landreev avatar Aug 26 '22 19:08 landreev

Let me pull the fix for the harvesting test from develop, which will run the integration test...

landreev avatar Aug 26 '22 19:08 landreev

"terminated abnormally", hmm. I'll try again.

landreev avatar Aug 26 '22 20:08 landreev

re:improvement - yes, I think this is just part of the bug fix - when the no metadatalanguage value changed from null to undefined, it made it get exported and caused an extra keyword line in the DDI. Fixing that fixes the exports in general and the extra keyword lines that we presumably didn't notice. So - if there's any release text change, I'd suggest something about it fixing a ~duplicate keyword tag in the ddi export.

qqmyers avatar Aug 26 '22 20:08 qqmyers

OK, sure, makes sense to mention it under "bug fixes". And we are already recommending a reexport.

landreev avatar Aug 26 '22 21:08 landreev

@pdurbin You were right - makes sense to document, as a bug fix.

landreev avatar Aug 26 '22 22:08 landreev

Oh my, the last Jenkins run must have gotten caught by the 7pm auto-kill again. This PR just can't catch a break!

landreev avatar Aug 27 '22 03:08 landreev