Invalid CVocConf leads to 500 errors
What steps does it take to reproduce the issue?
Change the CVocConf setting with
curl -X PUT --upload-file cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf
- When does this issue occur?
When updating the CVocConf settings http://localhost:8080/api/admin/settings/:CVocConf
like explained in the guidelines:
curl -X PUT --upload-file cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf
and an item in that json array is invalid (I guess it checks against the CVocConf.schema.json) the server stops working entirely redirecting to 500 - Internal Server Error - An unexpected error was encountered, no more information is available.
e.g. I removed the key: js-url
- Which page(s) does it occurs on?
dataverse main/index page
- What happens?
500 errors
-
To whom does it occur (all users, curators, superusers)?
-
What did you expect to happen?
The External Vocabulary script not running
Which version of Dataverse are you using?
v. 5.12.1 build 1122-cf90431
@transfluxus thanks for the bug report. Can you please attach your server.log file? It should show more details about the 500 error.
Hi @pdurbin
would it be in /usr/local/payara5/glassfish/domains/domain1/logs
there are a lot of logs, ranging back to 2022 and one named server.log_2024-06-13T07-29-51
attached. logs1.txt
understanding the schema and cvoc mechanism a bit better, that problem does not occur always. But for example this array item would trigger it ("js-urlX" instead of "js-url" and "retrieval-filtering" missing)
{
"field-name": "extLCProject",
"term-uri-field": "extLCProject",
"js-urlX": "https://diskordier.netX/lcProject.js",
"protocol": "localcontexts",
"retrieval-uri": "https://localcontextshub.org/api/v1/projects/?search={0}",
"allow-free-text": true,
"prefix": "https://localcontextshub.org/",
"managed-fields": {},
"languages": "",
"vocabs": {
"localcontexts": {
"uriSpace": "https://localcontextshub.org/"
}
}
}
It would be a good addition to improve handling of this error in Dataverse. As a work-around, there is a schema at https://github.com/gdcc/dataverse-external-vocab-support/blob/main/examples/config/CVocConf.schema.json that can be used to validate a CVoc config file.
@transfluxus thanks. I can't find anything in that log but I like @qqmyers's idea of validating the JSON on the side (outside of Dataverse) before uploading.
And yes, perhaps someday Dataverse could perform the validation itself.
Yeah, the datetime of the log-file is also a bit off. I will clean that folder and try to cause the error again and see if there will be a new file.
Indeed, the schema file is very important, since it seems to be the only place where all fields for a CVoc protocol are documented in detail. I ended up using https://www.jsonschemavalidator.net/, which works well for quick validation against some schema.
I can make some notes into the guidelines and make a pull request.
That would be great. Part of features being 'experimental' is that we don't have as much documentation and perhaps limited error handling, so contributions to either are appreciated (here and/or in the external vocab repo).
Waiting on user to make a pull request
@transfluxus Do you have an update on this?
Regarding the documentation, I think it makes sense to copy some parts of: https://github.com/gdcc/dataverse-external-vocab-support/tree/main/docs into the readme of the repo or maybe even into the main doc. and I hope I can get to it, after our meeting.
Regarding the implementation, I found this todo in the code: https://github.com/IQSS/dataverse/blob/dddcf29188a5c35174f3c94ffc1c4cb1d7fc0552/src/main/java/edu/harvard/iq/dataverse/DatasetFieldServiceBean.java#L281
I guess it would be sufficient to validate each JsonObject jo against the object schema, not the whole array, to filter out individual, invalid cvoc specifications, but that would require adding another validator library as dependency to the project, since I haven't identified any yet in the pom file.
2024/07/10
- What are next steps, folks? @transfluxus @pdurbin @sbarbosadataverse
@cmbz I believe the next step is for @transfluxus to make a pull request (anything you come up with is fine, really!) I don't think we need to heavily track this effort. We can probably remove it from the main board (34) and just look at the pull request whenever it comes in. No rush, that is. @cmbz @sbarbosadataverse what do you think?
@pdurbin sounds good! @transfluxus Please give us a heads up when you create the pull request. Thanks!