dataverse
dataverse copied to clipboard
GDCC/8924 - Check archival status
What this PR does / why we need it: Remove any html tags from supplied archival status (new APIs being added in 5.12). Also updates the Jsoup class we use in general.
Which issue(s) this PR closes:
Closes #8924
Special notes for your reviewer:
FWIW: The Jsoup update appears to just be a classname change but the docs do say the old class will be retired soon.
Suggestions on how to test this: Use the archival status API (must be superuser) to put an HTML message and confirm the result from doing a get (or what's displayed) does not include html.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation:
Coverage decreased (-0.006%) to 20.042% when pulling 0909f7e0c89d4c13335f6c637575363f6b9ae790 on GlobalDataverseCommunityConsortium:GDCC/8605-add-archival-status into 6ca1c9d873218cb1e28f8443e7e242912ae09c8f on IQSS:develop.
[My first foray into QA]
It looks good overall. the plain vanilla path does what it purports to do. If there are normal exceptions - dataset not found, version not available, user not found, user without proper permissions there are good error messages returned.
The only thing that I can see that needs to be addressed is that a JsonParsingException is not being caught and so the response code is 500 and the user only gets empty braces in the console. Looking at the logs reveals the exception.
Is that new? I would guess it is from https://github.com/IQSS/dataverse/blob/094c90a074d7bea4d884123690b5bb26f85d5284/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java#L3326 and the framework returning 500 before our code is invoked. If so, I could change to having the method start with a string so we can parse the initial json ourselves and return a better error.
So - basically the question is - is the use case sending bad json in or sending in valid json that, after html tags are stripped, becomes bad?
You're right. It's not new. The other methods in Datasets that catch the parsing exceptions are taking in strings and parsing them inside the method. Let me think about having a case where stripping the tags would yield a parsing exception - if that could happen.
My guess is that's hard to do, but I can add a change to use string and catch the parse exception post sanitization. (It probably makes more sense to avoid converting to Json twice anyway.)