[feature][WIP] YAML Schema validation for syntax compatibility check
With respect to the latest backwards-incompatible changes, input files may require some compatibility check. It is quite common that drawings are revised after many years. ~~May I suggest adding an (optional) "syntax version" tag? How else could it be assured that future versions of WireViz support the syntax of an older one?~~
Note that following issue was raised by @nanangp for adding versioning checks on the CLI syntax as well: https://github.com/nanangp/vscode-wireviz-preview/issues/6
Going a bit further, some of the settings given as a command line argument, could also be listed in a header of the YAML file. This could either be a comment or incorporated in the WireViz syntax, like the technical drawing changes require adding some general info for the title block as well.
This suggestion might also be important for #223 YAML parser replacement.
I changed the title because I believe that schema validation provides just what we need here. It allows to check your YAML even before running WireViz.
Example from https://github.com/redhat-developer/yaml-language-server :
Associating a schema in the YAML file It is possible to specify a yaml schema using a modeline. Schema url can be a relative path. If a relative path is specified, it is calculated from yaml file path, not from workspace root path
# yaml-language-server: $schema=<urlToTheSchema>
Found this file by @cooked as a starting point: wireviz-yaml-schema.json
TODO
- [x] Implement the WireViz syntax according to the current spec in syntax.md
- [x] Recognize cable and wire designator through pattern matching
- [ ] Validate that designators in connections exist
- [ ] Recognize special designators and auto-generation syntax in connections
- [ ] Assure the correct order of cables and wires in the connections list
Additional to JSON Schema based validation that is widely supported, there is also 23andMe/Yamale which relies on its own schema defined in YAML itself. It can read data through either PyYAML or Ruamel, so choosing this would depend on the parser choice taken in #223.
I would like to mention that checks like in https://github.com/wireviz/WireViz/commit/9b2b22d6aee17ea99a6f61d2e5d4eba0426ab028 will not be needed if schema validation was done programmatically.
Note that the validation should also take effect with the GH action Create Examples once implemented.
Just providing a quick update:
- I have been fiddling with JSON Schema. It is a powerful tool, but things quickly get complex.
- I am planning to evaluate Yamale and compare both solutions. It seems easier to define the schema in YAML.
Any comments on this are welcome. The main question that arises: Do we also need to provide live Schema validation for editing i.e. in VS Code? See also https://github.com/nanangp/vscode-wireviz-preview/issues/7 on this topic.
I have attached the YAML Schema for Yamale. GH would not accept the file extension, so I had to rename it to .TXT.
schema.yaml
# WireViz YAML Schema for syntax validation using Yamale
connectors: map(include('CONNECTOR')) # dictionary of all used connectors
cables: map(include('CABLE')) # dictionary of all used cables and wires
connections: list(include('CONNECTION')) # list of all connections to be made
additional_bom_items: list(include('additional_bom-item'), required=False) # custom items to add to BOM
metadata: include('metadata', required=False) # dictionary of meta-information describing the harness
options: include('option', required=False) # dictionary of common attributes for the whole harness
tweak: include('tweak', required=False) # optional tweaking of .gv output
---
DESIGNATOR: regex('^[a-zA-Z\200-\377_-][0-9a-zA-Z\200-\377_\\.]*$', name='designator', multiline=False)
---
CONNECTOR: # unique connector designator/name
# pinout information
# at least one of the following must be specified
pincount: int(min=1, required=False) # if omitted, is set to length of specified list(s)
pins: list(required=False) # if omitted, is autofilled with [1, 2, ..., pincount]
pinlabels: list(required=False) # if omitted, is autofilled with blanks
# pin color marks (optional)
pincolors: list(required=False) # list of colors to be assigned to the respective pins;
# general information about a connector (all optional)
type: str(required=False)
subtype: str(required=False)
color: include('colorname', required=False) # see below
image: include('image', required=False)
notes: str(required=False)
# product information (all optional)
ignore_in_bom: bool(required=False) # if set to true the connector is not added to the BOM
pn: subset(int(),str(),allow_empty=True) # [internal] part number
manufacturer: str(required=False) # manufacturer name
mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
supplier: str(required=False) # supplier name
spn: subset(int(),str(),allow_empty=True) # supplier part number
additional_components: list(include('additional_component'), required=False) # additional components
# rendering information (all optional)
bgcolor: include('colorname', required=False) # Background color of diagram connector box
bgcolor_title: include('colorname', required=False) # Background color of title in diagram connector box
style: str(equals='simple', required=False) # may be set to simple for single pin connectors
show_name: bool(required=False) # defaults to true for regular connectors, false for simple connectors
show_pincount: bool(required=False) # defaults to true for regular connectors false for simple connectors
hide_disconnected_pins: bool(required=False) # defaults to false
loops: list(required=False) # every list item is itself a list of exactly two pins on the connector that are to be shorted
---
CABLE: # unique cable designator/name
# conductor information
# the following combinations are permitted:
# wirecount only no color information is specified
# colors only wirecount is inferred from list length
# wirecount + color_code colors are auto-generated based on the specified
# color code (see below) to match the wirecount
# wirecount + colors colors list is trimmed or repeated to match the wirecount
wirecount: int(min=1, required=False)
colors: list(required=False) # list of colors (see below)
color_code: include('colorcode', required=False) # one of the supported cable color codes (see below)
wirelabels: list(required=False) # optional; one label for each wire
# general information about a connector (all optional)
category: str(equals='bundle', required=False) # may be set to bundle;
type: str(required=False)
gauge: subset(int(), num(), str(min=1), allow_empty=True) # allowed formats:
show_equiv: bool(required=False) # defaults to false; can auto-convert between mm2 and AWG and display the result when set to true
length: subset(num(), str(min=1), allow_empty=True) # num() is assumed to be in meters unless <unit> is specified
shield: subset(bool(), include('colorname'), allow_empty=True) # defaults to false
color: include('colorname', required=False) # see below
image: include('image', required=False)
notes: str(required=False)
# product information (all optional)
ignore_in_bom: bool(required=False) # if set to true the cable or wires are not added to the BOM
pn: subset(int(),str(),allow_empty=True) # [internal] part number
manufacturer: subset(str(required=False),allow_empty=True) # manufacturer name
mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
supplier: subset(str(),allow_empty=True) # supplier name
spn: subset(int(),str(),allow_empty=True) # supplier part number
additional_components: list(include('additional_component'), required=False) # additional components
# rendering information (all optional)
bgcolor: include('colorname', required=False) # Background color of diagram cable box
bgcolor_title: include('colorname', required=False) # Background color of title in diagram cable box
show_name: bool(required=False) # defaults to true
show_wirecount: bool(required=False) # defaults to true
show_wirenumbers: bool(required=False) # defaults to true for cables; false for bundles
---
CONNECTION: list(any(include('DESIGNATOR'),list(include('DESIGNATOR')),map(),enum('--','<--','<-->','-->','==','<==','<==>','==>')),required=False) # list of all connections to be made
---
additional_bom-item: # custom items to add to BOM
description: str()
# all the following are optional:
qty: num(required=False) # qty to add to the bom (defaults to 1)
unit: str(required=False)
designators: list(include('DESIGNATOR'), required=False)
pn: subset(int(),str(),allow_empty=True) # [internal] part number
manufacturer: str(required=False) # manufacturer name
mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
supplier: str(required=False) # supplier name
spn: subset(int(),str(),allow_empty=True) # supplier part number
---
additional_component:
type: str() # type of additional component
# all the following are optional:
subtype: str(required=False) # additional description (only shown in bom)
qty: num(required=False) # qty to add to the bom (defaults to 1)
qty_multiplier: str(required=False) # multiplies qty by a feature of the parent component
unit: str(required=False)
pn: subset(int(),str(),allow_empty=True) # [internal] part number
manufacturer: str(required=False) # manufacturer name
mpn: subset(int(),str(),allow_empty=True) # manufacturer part number
supplier: str(required=False) # supplier name
spn: subset(int(),str(),allow_empty=True) # supplier part number
bgcolor: include('colorname', required=False) # Background color of entry in diagram component box
---
metadata: map(key=str(), required=False) # dictionary of meta-information describing the harness
---
option: # dictionary of common attributes for the whole harness
bgcolor: include('colorname', required=False) # Default = 'WH'
bgcolor_node: include('colorname', required=False) # Default = 'WH'
bgcolor_connector: include('colorname', required=False) # Default = bgcolor_node
bgcolor_cable: include('colorname', required=False) # Default = bgcolor_node
bgcolor_bundle: include('colorname', required=False) # Default = bgcolor_cable
color_mode: enum('full','FULL','hex','HEX','short','SHORT','ger','GER', required=False)
fontname: str(required=False) # Default = 'arial'
mini_bom_mode: bool(required=False) # Default = True
---
tweak: # optional tweaking of .gv output
override: subset(map(required=False),allow_empty=True) # dict of .gv entries to override
append: any(str(),list(),required=False) # string or list of strings to append to the .gv output
---
colorname: any(enum('BK','WH','GY','PK','RD','OG','YE','OL','GN','TQ','LB','BU','VT','BN','BG','IV','SL','CU','SN','SR','GD'),regex('#[0-9A-Fa-f]{6}',name='html_color'))
colorcode: enum('DIN', 'IEC', 'TEL', 'TELALT', 'T568A', 'T568B', 'BW')
---
image:
src: str() # path to the image file
# optional parameters:
caption: str(required=False) # text to display below the image
bgcolor: include('colorname', required=False) # Background color of entry in diagram component box
width: int(required=False) # range: 1~65535; unit: points
height: int(required=False) # range: 1~65535; unit: points
Running Yamale with schema.yaml in the WireViz folder will validate all YAML files from there. It even found that the GitHub Actions would not match and that there is an issue with ex14.yml...
C:\Git\martinrieder\WireViz>yamale --no-strict -n 1 -s schema.yaml
Finding yaml files...
Found 26 yaml files.
Validating...
Validation failed!
Error validating data 'C:\Git\martinrieder\WireViz\.github\workflows\main.yml' with schema 'schema.yaml'
connectors: Required field missing
cables: Required field missing
connections: Required field missing
----
Error validating data 'C:\Git\martinrieder\WireViz\examples\ex14.yml' with schema 'schema.yaml'
connections.0.7: '<=>' is not a designator.
connections.0.7: '<=>' is not a list.
connections.0.7: '<=>' is not a map.
connections.0.7: '<=>' not in ('--', '<--', '<-->', '-->', '==', '<==', '<==>', '==>')
@martinrieder wrote:
[...]
C:\Git\martinrieder\WireViz>yamale --no-strict -n 1 -s schema.yaml Finding yaml files... Found 26 yaml files. Validating... Validation failed! Error validating data 'C:\Git\martinrieder\WireViz\.github\workflows\main.yml' with schema 'schema.yaml'
.github\workflows\main.yml is not a WireViz input file, and uses another syntax.
connectors: Required field missing cables: Required field missing connections: Required field missing
None of these are required to be valid WireViz input. All documented top-level keys are optional, and any other top-level keys are just ignored. A random non-documented top-level key is often used to make templates to inherit later in the YAML input, e.g. in demo02.yml, ex05.yml, and ex06.yml. The syntax documentation is currently not very clear about this.
WireViz will happily process YAML input including only a single key, and it doesn't have to be one of the documented ones. Any non-dict as top-level will fail because yaml_data is assumed to be a dict.
Do you want to validate what WireViz really accept, or a stricter, more limited syntax interpreted from a syntax description not written precisely enough for validation?
With simple use cases it makes sense to drop top-level sections without contents. With v0.4, connections is required to obtain a non-empty diagram, but e.g. #267 is a use case where only additional_bom_items is needed.
Error validating data 'C:\Git\martinrieder\WireViz\examples\ex14.yml' with schema 'schema.yaml' connections.0.7: '<=>' is not a designator. connections.0.7: '<=>' is not a list. connections.0.7: '<=>' is not a map. connections.0.7: '<=>' not in ('--', '<--', '<-->', '-->', '==', '<==', '<==>', '==>')
Your validation rules are stricter than what WireViz accepts. The syntax documentation doesn't list all legal variations. See https://github.com/wireviz/WireViz/blob/v0.4/src/wireviz/wv_helper.py#L128-L140
@kvid wrote:
Do you want to validate what WireViz really accept, or a stricter, more limited syntax interpreted from a syntax description not written precisely enough for validation?
Neither of both. This is just an investigation about finding a tool that would suit the following purpose:
- Validation at runtime: simple compatibility check for old YAML files
- Validation while editing: syntax highlighting and code completion
The latter is closely related to https://github.com/nanangp/vscode-wireviz-preview/issues/7
.github\workflows\main.ymlis not a WireViz input file, and uses another syntax. [...] Your validation rules are stricter than what WireViz accepts. The syntax documentation doesn't list all legal variations.
There is no sense in proving that it passes the tests. I need to put up some strict rules to show that it will detect some issues, even by feeding it the wrong files.
Thanks for the hint to the code. Copying the syntax description into a schema format that Yamale expects was quite easy. You proved that this manual process is not ideal though. Yet, the resulting file is still human-readable, kind of self-explaining...
The point that @liambeguin is making in https://github.com/wireviz/WireViz/issues/220#issuecomment-2158760443 is absolutely valid. If the schema could be generated from code, that would make it much easier to maintain.
Uploading my DRAFT of the JSON Schema file here. Please validate your files with this and let me know if I missed something.
The file structure is based on the Yamale schema, which in turn is an interpretation of the manually written syntax description.
I did some adaptions after looking at DataClasses.py and thereby noticed that image was missing the attributes scale and fixedsize. This is corrected by PR #386.
EDIT: I would like to clarify that I do not see much sense in putting extra effort into maintaining schema files manually. Consider this still an evaluation for proof of concept only!
Ideally, there should be some functionally in WireViz to generate matching files based on internal data structures:
- JSON Schema definitions
- YAML example/templates
It seems that pydantic is the only tool to support this. Thanks to @liambeguin for suggesting this!