augur
augur copied to clipboard
Improve validation output to identify problematic nodes / properties
Validation against schemas is currently extremely useful at identifying when a dataset doesn't conform but often doesn't help pinpoint the part of the dataset that needs fixing.
As an example, this Auspice dataset ash_10samps.json (from https://github.com/nextstrain/auspice/issues/1756) failed validation in two places:
$ augur validate export-v2 ash_10samps.json
Validating schema of 'ash_10samps.json'...
.tree {"children": [{"branch_attrs": {"labels": {"id":…} failed oneOf validation for [{"$ref": "#/$defs/tree"}, {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}]
validation for arm 0: {"$ref": "#/$defs/tree"}
.tree {"children": [{"branch_attrs": {"labels": {"id":…} failed required validation for ["name", "node_attrs"]
validation for arm 1: {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}
.tree {"children": [{"branch_attrs": {"labels": {"id":…} failed type validation for "array"
.meta {"colorings": [{"key": "country", "title": "Coun…} failed required validation for ["updated", "panels"]
FATAL ERROR: Validation of 'ash_10samps.json' failed.
In instead we could change the error reporting to something like the following then debugging would be a whole lot easier, both for us internally and for external users:
- .tree {"children": [{"branch_attrs": {"labels": {"id":…} failed required validation for ["name", "node_attrs"]
+ .tree node name "wrapper" is missing the required "node_attrs" object
- .meta {"colorings": [{"key": "country", "title": "Coun…} failed required validation for ["updated", "panels"]
+ .meta object is missing the required "updated" property