compliance-trestle
compliance-trestle copied to clipboard
Addition of validation modes for metaschmema and other constraints
Issue description / feature objectives
There is a need to add additional validation constraints to the python classes in trestle, and there are a number of ways to add them. This is a summary of options.
Many of the constraints defined by the NIST OSCAL schema are captured in the JSON schema, but not all. The ones that are captured are automatically generated in the python oscal classes as, for example, required vs. optional field, or a regex expression defining allowed string styles.
But higher level constraints in the metaschema cannot be captured by direct translation and need to be handled separately.
There is an added complication that trestle needs to be able to generate a valid example of each class at runtime, which means there needs to be an ability to "invert" the validation and generate valid example models somehow. This means there needs to be a connection between the validation code and the generation code.
Here are ways to do these things:
- There is a validation factory that allows addition of arbitrary validation routines for specific models. This has the advantage of doing arbitrarily complex checks - but it is not invoked automatically on instantiation of objects, and there is no linkage to the generate code. It needs to be invoked on import and export of the model, and validation rules would need to be duplicated somehow in the generate code.
- The generate code recognizes certain field types by name and can insert standard strings such as "REPLACE_ME" - with the knowledge that string is consistent with the regex.
- For some other fields, the regex string is checked and a corresponding valid string is generated based on recognizing that regex - such as a UUID. In these cases the goal isn't to create a random but valid string because it will look like a "real" UUID - and instead a clearly fake one with many zeros is provided. The resulting duplication of uuid values may itself represent an invalid model until those values are replaced.
- Some strings in the generated python classes have regex patterns that are general "token" strings - but the details in the OSCAL documentation clarify that only certain values are allowed, such as "control" or "statement". For that specific case, the class generated by gen_oscal.py can have the regex switched from a generic token to "(control|statement)". This means the validation will happen at runtime during class instantiation and it won't require a separate validation. In addition, there is a reverse regex module called "xeger" that can take a given regex pattern and generate a random string consistent with that pattern. If the generate code makes use of this module, it can generate valid strings directly from the regex and not require coupling with the validation code.
- Another way to add a runtime validation check is to insert a validation check directly into the generated python class for the model. This is already done to validate the OSCAL version in classes, and a validation function is inserted by the
oscal_normalize.py
script into theOscalVersion
class. This has the advantage or automatic execution during runtime, but the disadvantage of not being directly linked to the generate code. - Higher level rules such as in
links_validator.py
are more complex and require matching links and references to confirm none is missing or redundant - and validations like that are a good fit for the validation factory and corresponding checks only on import and export. - In all this there is a need to provide flexibility in handling models that are not completely valid so that working with trestle isn't too rigid - particularly as models are being edited and refined.