data.gov
data.gov copied to clipboard
Create Flask Route for DCAT-US validation
User Story
In order to have the capability to validate a DCAT-US file, data.gov data providers wants a route to validate against the latest DCAT-US schema.
Acceptance Criteria
-
[ ] GIVEN a the data provider flask app exists
WHEN a user posts to /dcat-us/validate with a valid DCAT-US file
THEN the metadata is confirmed to be good -
[ ] GIVEN a the data provider flask app exists
WHEN a user posts to /dcat-us/validate with an invalid DCAT-US file
THEN the metadata is confirmed to be bad
AND a detailed analysis is provided of what is invalid in the dataset
Background
The current one exists on catalog: https://catalog.data.gov/dcat-us/validator It's not great. This doesn't accept a loaded file, only a URL to a file. The options we want to support in the API:
- Content (DCAT-US text, if applicable)
- File stream (DCAT-US text, if applicable)
- URL (DCAT-US text, if applicable)
- Schema version (default to DCAT-US 1.1, also accept 1.0, to accept future version changes)
- Non-federal validation (defaults to false, federal validation)
Security Considerations (required)
Long term, we may want to support this as a separate application for security considerations (as we are accepting data from the world, and sanitizing input is hard). Consider sanitizing input if possible.
Sketch
See above for background. Create flask route called /dcat-us/validate, and implement the validation provided in datagov-harvesting-logic Expect the output of this route to have more parseable and usable output, see https://github.com/GSA/data.gov/issues/4427 for current pain points.