Improved JSON Schema validation for datasets
Overview of the Feature Request In version 6.1 we added a json schema for validating dataset json upload files. The first release only verifies the presence of valid json formatting as well as required elements and required fields (customized by collection). Enhancements which have been requested or should be contemplated.
- Controlled vocabulary support - get valid responses and verify that it/they are present (validate that there may be only one response or multiple responses as specified by the datasetfieldtype)
- field type checking - validate that the values provided conform to the type specified by the datasetfieldtype)
- enhanced error messages - the current validation library does not provide helpful error messages for every exception. See if there are other ways to provide useful information to the user.
What kind of user is the feature intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) Users with edit dataset permission who wish to validate their dataset json prior to upload.
What inspired the request? responses to the first release of the dataset schema which were made too late to be included in 6.1
What existing behavior do you want changed? the get dataset schema api and validator Which issue(s) this PR closes: https://github.com/IQSS/dataverse/issues/10169
Closes #10169
Special notes for your reviewer:
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No
Is there a release notes update needed for this change?: Yes. Included
Additional documentation: native-api.rst
Preview at https://dataverse-guide--10543.org.readthedocs.build/en/10543/api/native-api.html#validate-dataset-json-file-for-a-collection
coverage: 20.804% (+0.05%) from 20.752% when pulling 55a8bceb68957799f2af2742726b0e092a3a8f00 on 10169-JSON-schema-validation into 7d4d534338161b5f0f5a1ce0079304f8ec3b7a80 on develop.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
To me, it looks great already! I have two small points that could be beneficial for integration into external tools/libs:
The message is good, but it is currently limited to a human-readable format. Adding a JSONPath or any other path that displays the exact location would allow other libraries to do more with the validation result. Furthermore, adding different error types could help. For instance, if a type validation fails, this could be indicated.
If I could imagine a response example it would look something like this:
Paths are not accurate
{
"is_valid": "yes",
"errors": [
{
"location": "citation/fields/0/value",
"error_type": "required",
"message": "The title field is required."
},
{
"location": "citation/fields/1/value",
"error_type": "invalid",
"message": "The description must be a string."
}
]
}
Is it possible to derive such a format from your validator?
To me, it looks great already! I have two small points that could be beneficial for integration into external tools/libs:
The message is good, but it is currently limited to a human-readable format. Adding a JSONPath or any other path that displays the exact location would allow other libraries to do more with the validation result. Furthermore, adding different error types could help. For instance, if a type validation fails, this could be indicated.
Is it possible to derive such a format from your validator?
Unfortunately the validator stops once it finds the first issue so displaying a list would not be possible. It could be possible to add more information to the error to help pinpoint the exact location of the issue found.
:package: Pushed preview images as
ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation
:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.
The PR looks great, I meant to merge it last night. I didn't go out of my way trying things to break it; but all the valid json files I tried were accepted. And all the broken ones were recognized as invalid, with meaningful error messages. The error message for the case where required fields are missing could be prettier, tbh. - maybe list the specific fields that must be populated? - but it didn't feel serious enough to complain. Merging.