data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

Create Flask Route for DCAT-US validation

Open jbrown-xentity opened this issue 11 months ago • 0 comments

User Story

In order to have the capability to validate a DCAT-US file, data.gov data providers wants a route to validate against the latest DCAT-US schema.

Acceptance Criteria

  • [ ] GIVEN a the data provider flask app exists
    WHEN a user posts to /dcat-us/validate with a valid DCAT-US file
    THEN the metadata is confirmed to be good

  • [ ] GIVEN a the data provider flask app exists
    WHEN a user posts to /dcat-us/validate with an invalid DCAT-US file
    THEN the metadata is confirmed to be bad
    AND a detailed analysis is provided of what is invalid in the dataset

Background

The current one exists on catalog: https://catalog.data.gov/dcat-us/validator It's not great. This doesn't accept a loaded file, only a URL to a file. The options we want to support in the API:

  • Content (DCAT-US text, if applicable)
  • File stream (DCAT-US text, if applicable)
  • URL (DCAT-US text, if applicable)
  • Schema version (default to DCAT-US 1.1, also accept 1.0, to accept future version changes)
  • Non-federal validation (defaults to false, federal validation)

Security Considerations (required)

Long term, we may want to support this as a separate application for security considerations (as we are accepting data from the world, and sanitizing input is hard). Consider sanitizing input if possible.

Sketch

See above for background. Create flask route called /dcat-us/validate, and implement the validation provided in datagov-harvesting-logic Expect the output of this route to have more parseable and usable output, see https://github.com/GSA/data.gov/issues/4427 for current pain points.

jbrown-xentity avatar Feb 29 '24 22:02 jbrown-xentity