8868 fix json import
What this PR does / why we need it: this PR makes sure that the json export of a dataset does not include metadataLanguage if it is undefined
Which issue(s) this PR closes:
Closes #8868 dataverse_json with "metadataLanguage":"undefined" cannot be imported
Special notes for your reviewer: I moved the test for isMetadataLanguageSet to the DVObjectContainer so that it may be used in other situations where similar code was in place.
Suggestions on how to test this: make sure that a dataset exported where the metadataLanguage is undefined may be imported.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No Is there a release notes update needed for this change?: No Additional documentation: None
Coverage decreased (-0.003%) to 20.043% when pulling 4fdc499d2beaf07901c0646fdef03024a3df7fbf on 8868-fix-json-import into eed4775da485028d1d54bb3de4b5b1801e1e9870 on develop.
@sekmiller @qqmyers This looked like a most straightforward PR, but I managed to find something slightly puzzling in it while testing. It definitely fixes the json export. But since the DDI export is nominally touched in the PR, I checked on that too, just in case, and I'm seeing the following: The dataset has a single Subject. In the DDI export in the develop branch:
<keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>
<keyword>Medicine, Health and Life Sciences</keyword>
In this branch:
<keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>
I can't see immediately how the changes in the branch would result in this. (?) Also, which is the correct behavior? (to me the latter looks more correct) Jim, I added you because languages in metadata exports is your area.
Two guesses: I think the design is for the keywords to show in the platform default lang, and, if different, the metaata lang as well. So two possibilities - one machine used for testing has a non-english language (i.e. C.utf8 can be the default instead) so it creates english and a default, or possibility 2: the metadatalang=default in the json, which is used as the starting point for the DDI export was tricking the code into adding the second entry somehow. Looking at the code, my guess would be the former, i.e. that the machine locale is not english. (I've seen this on Ubuntu, not sure what else does it by default). i.e. !defaultLocale.getLanguage().equals(the metadatalanguage for the dataset) at https://github.com/IQSS/dataverse/blob/1ef73b050eeaf42dd6ffe9ae735afedefd36fca9/src/main/java/edu/harvard/iq/dataverse/export/ddi/DdiExportUtil.java#L1487
Not sure though...
Both branches were tested on the same system (my dev. box), on the same dataset (by deleting the cached export, asking for ddi again). Tested develop -> switched to branch -> switched back to develop - the behavior is consistent as described above. (will switch back to the branch one more time). While on the same branch, no difference between the ddi produced by the export API, vs. the export performed as part of publishing - that crossed my mind too.
(the locale jumping between different settings between redeployments, unrelated to what's in the branch - not entirely impossible)
@qqmyers It is in fact the result our DDI being produced from our json. The new behavior - just one subject/keyword entry - appears to be an improvement, correct? So, ok to merge the branch?
Improvement or not, if we're going with this change, please document it in the release note snippet.
@pdurbin
Improvement or not, if we're going with this change, please document it in the release note snippet.
Actually, no, I'm not sure this needs to be documented. I don't think the "old" behavior was ever documented, or intentional. To me it looks like it was only there since 5.11, and, again, as the result of that "metadataLanguage":"undefined" entry in the json; which was not added on purpose, and which this PR is reversing.
But I'll wait for an authoritative answer from Jim.
Let me pull the fix for the harvesting test from develop, which will run the integration test...
"terminated abnormally", hmm. I'll try again.
re:improvement - yes, I think this is just part of the bug fix - when the no metadatalanguage value changed from null to undefined, it made it get exported and caused an extra keyword line in the DDI. Fixing that fixes the exports in general and the extra keyword lines that we presumably didn't notice. So - if there's any release text change, I'd suggest something about it fixing a ~duplicate keyword tag in the ddi export.
OK, sure, makes sense to mention it under "bug fixes". And we are already recommending a reexport.
@pdurbin You were right - makes sense to document, as a bug fix.
Oh my, the last Jenkins run must have gotten caught by the 7pm auto-kill again. This PR just can't catch a break!