augur icon indicating copy to clipboard operation
augur copied to clipboard

Improve validation output to identify problematic nodes / properties

Open jameshadfield opened this issue 1 year ago • 0 comments

Validation against schemas is currently extremely useful at identifying when a dataset doesn't conform but often doesn't help pinpoint the part of the dataset that needs fixing.

As an example, this Auspice dataset ash_10samps.json (from https://github.com/nextstrain/auspice/issues/1756) failed validation in two places:

$ augur validate export-v2 ash_10samps.json
Validating schema of 'ash_10samps.json'...
  .tree {"children": [{"branch_attrs": {"labels": {"id":…} failed oneOf validation for [{"$ref": "#/$defs/tree"}, {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}]
    validation for arm 0: {"$ref": "#/$defs/tree"}
      .tree {"children": [{"branch_attrs": {"labels": {"id":…} failed required validation for ["name", "node_attrs"]
    validation for arm 1: {"type": "array", "minItems": 1, "items": {"$ref": "#/$defs/tree"}}
      .tree {"children": [{"branch_attrs": {"labels": {"id":…} failed type validation for "array"
  .meta {"colorings": [{"key": "country", "title": "Coun…} failed required validation for ["updated", "panels"]
FATAL ERROR: Validation of 'ash_10samps.json' failed.

In instead we could change the error reporting to something like the following then debugging would be a whole lot easier, both for us internally and for external users:

- .tree {"children": [{"branch_attrs": {"labels": {"id":…} failed required validation for ["name", "node_attrs"]
+ .tree node name "wrapper" is missing the required "node_attrs" object
- .meta {"colorings": [{"key": "country", "title": "Coun…} failed required validation for ["updated", "panels"]
+ .meta object is missing the required "updated" property

jameshadfield avatar Mar 03 '24 23:03 jameshadfield