dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Improved JSON Schema validation for datasets

Open stevenwinship opened this issue 1 year ago • 11 comments

Overview of the Feature Request In version 6.1 we added a json schema for validating dataset json upload files. The first release only verifies the presence of valid json formatting as well as required elements and required fields (customized by collection). Enhancements which have been requested or should be contemplated.

  • Controlled vocabulary support - get valid responses and verify that it/they are present (validate that there may be only one response or multiple responses as specified by the datasetfieldtype)
  • field type checking - validate that the values provided conform to the type specified by the datasetfieldtype)
  • enhanced error messages - the current validation library does not provide helpful error messages for every exception. See if there are other ways to provide useful information to the user.

What kind of user is the feature intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) Users with edit dataset permission who wish to validate their dataset json prior to upload.

What inspired the request? responses to the first release of the dataset schema which were made too late to be included in 6.1

What existing behavior do you want changed? the get dataset schema api and validator Which issue(s) this PR closes: https://github.com/IQSS/dataverse/issues/10169

Closes #10169

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Yes. Included

Additional documentation: native-api.rst

Preview at https://dataverse-guide--10543.org.readthedocs.build/en/10543/api/native-api.html#validate-dataset-json-file-for-a-collection

stevenwinship avatar May 07 '24 17:05 stevenwinship

Coverage Status

coverage: 20.804% (+0.05%) from 20.752% when pulling 55a8bceb68957799f2af2742726b0e092a3a8f00 on 10169-JSON-schema-validation into 7d4d534338161b5f0f5a1ce0079304f8ec3b7a80 on develop.

coveralls avatar May 07 '24 17:05 coveralls

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar May 07 '24 18:05 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar May 09 '24 20:05 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar May 10 '24 14:05 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar May 10 '24 14:05 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 03 '24 14:06 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 10 '24 20:06 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 10 '24 21:06 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 17 '24 13:06 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 17 '24 14:06 github-actions[bot]

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Jun 17 '24 15:06 github-actions[bot]

To me, it looks great already! I have two small points that could be beneficial for integration into external tools/libs:

The message is good, but it is currently limited to a human-readable format. Adding a JSONPath or any other path that displays the exact location would allow other libraries to do more with the validation result. Furthermore, adding different error types could help. For instance, if a type validation fails, this could be indicated.

If I could imagine a response example it would look something like this:

Paths are not accurate

{
  "is_valid": "yes",
  "errors": [
    {
      "location": "citation/fields/0/value",
      "error_type": "required",
      "message": "The title field is required."
    },
    {
      "location": "citation/fields/1/value",
      "error_type": "invalid",
      "message": "The description must be a string."
    }
  ]
}

Is it possible to derive such a format from your validator?

JR-1991 avatar Jul 11 '24 12:07 JR-1991

To me, it looks great already! I have two small points that could be beneficial for integration into external tools/libs:

The message is good, but it is currently limited to a human-readable format. Adding a JSONPath or any other path that displays the exact location would allow other libraries to do more with the validation result. Furthermore, adding different error types could help. For instance, if a type validation fails, this could be indicated.

Is it possible to derive such a format from your validator?

Unfortunately the validator stops once it finds the first issue so displaying a list would not be possible. It could be possible to add more information to the error to help pinpoint the exact location of the issue found.

stevenwinship avatar Jul 12 '24 14:07 stevenwinship

:package: Pushed preview images as

ghcr.io/gdcc/dataverse:10169-JSON-schema-validation
ghcr.io/gdcc/configbaker:10169-JSON-schema-validation

:ship: See on GHCR. Use by referencing with full name as printed above, mind the registry name.

github-actions[bot] avatar Aug 14 '24 18:08 github-actions[bot]

The PR looks great, I meant to merge it last night. I didn't go out of my way trying things to break it; but all the valid json files I tried were accepted. And all the broken ones were recognized as invalid, with meaningful error messages. The error message for the case where required fields are missing could be prettier, tbh. - maybe list the specific fields that must be populated? - but it didn't feel serious enough to complain. Merging.

landreev avatar Aug 14 '24 19:08 landreev