Dynamic validation based on keys in data (internal reference)
Hey,
I have a thought for dynamic schema validation that I think is currently lacking (perhaps not too common request, though I believe it can be implemented as another validator).
In this case, I have a data YAML file where a mapping is expected. The keys may be anything the user decides, and the values are according to some predefined schema - so far, so good.
Next, at a later part in the data YAML, some other mapping occurs. Here, the value of the mapping has to relate to the keys defined earlier.
For example (a data YAML):
metrics:
iou: # this can be whatever the user chooses
name: foo
unit: bar
direction: up
...
results:
- name: something
metric: iou # this has to be one of the keys defined under `metrics` above
value: 59
Similar feature has been requested earlier: #154
I hadn't noticed that one, my apologies.
I'm not sure why it's not within the scope of this project. Internal references are common, and since the file is read anyway, the schema can be dynamic in that sense...
The validators themselves only have access to the object being validated, so I imagine this would require quite a large refactoring of the project to support this.
Is that the case? It's been a while since I contributed to the project so I don't recall the details, but it seems that in schema.py#L80, the full contents of data are passed around.
That would suggest, for example, that this type of validators would be deferred until data is provided.
Its the Validator class that is used to validate an object, which only receives the object it self, not the full yaml file. So this can't be solved by simply introducing a new validator.
Of course this would require a bit more complicated implementation (i.e. not simply subclassing the Validator class). That shouldn't be a problem though.
If I understand correctly, the full flow is as follows:
- Create a dictionary of validators in
Schema(method_process_schemareturns a dictionary which is then assigned toself._schema). - Calls to
Schema.validatepass the full YAML data content. - The
validatemethod passes the dictionaryself._schema, which then winds down to_validate_static_map_listmethod. - In
_validate_static_map_list, the keys of the data and the keys of the validator map are compared. If they mismatch, an error is raised. If they do match, we start iterating on a per-key basis by passing along thesub_validator, the key, and again, the full YAML data, to_validate_item. - Finally, in
_validate_item, we try to pull the relevant data-item from the full (or parent, since it recurses down eventually) YAML content, and then call_validatewith the validator and data-item.
So, my suggestion would then be to allow a deferment of validators at a higher level here. For example, a Validator could help a boolean is_deferred, in which case, we do not attempt to pull the specific data-item from the YAML content, but rather pass along either the parent/full YAML data, depending on the reference type, for example.
Hi,
Five years ago, I propose two new Validators in #82 that may answer your need. My PR was refused because leaders doesn't want dynamics schema. In some way that's a good practice to have static shema.
Arnaud.