Validation errors are hard to present safely to the user (missing abstraction)
https://cyclonedx-python-library.readthedocs.io/en/v10.2.0/autoapi/cyclonedx/validation/
We are using both JSON and XML inputs, and when something is wrong with the input, it is not easy to get the location of the problem or even what is wrong can be hidden in a multi-MB message.
One of the problem is, that the underlying libraries make it hard:
jsonschemaincludes all the input (instance) in the error message, which in the SBOM case can be quite big, producing the above mentioned multi-MB message (thisuniqueItemscheck can fail on e.g. thedependencies):yield ValidationError(f"{instance!r} has non-unique elements")- in the xml case, somehow the easiest solution was to get the error from the logs: https://github.com/CycloneDX/cyclonedx-python-lib/blob/1a932a2ab00efb029c7b685cba7d9e5af3b7ea19/cyclonedx/validation/xml.py#L71
The other problem is, that CycloneDX makes no attempt at transforming these different object types into something sensible and type-safe for users, the raw objects are simply leaked through the interface as is in https://github.com/CycloneDX/cyclonedx-python-lib/blob/1a932a2ab00efb029c7b685cba7d9e5af3b7ea19/cyclonedx/validation/init.py#L36
Code samples triggering long messages:
from cyclonedx.validation.json import JsonStrictValidator
from cyclonedx.schema import SchemaVersion
test_data_file = "tests/_data/schemaTestData/1.2/invalid-license-id-1.2.json"
schema_version = SchemaVersion.V1_2
validator = JsonStrictValidator(schema_version)
with open(test_data_file) as tdfh:
test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))
This message is 35508 characters long - 767 lines!
from cyclonedx.validation.xml import XmlValidator
from cyclonedx.schema import SchemaVersion
test_data_file = "tests/_data/schemaTestData/1.1/invalid-license-id-1.1.xml"
schema_version = SchemaVersion.V1_1
validator = XmlValidator(schema_version)
with open(test_data_file) as tdfh:
test_data = tdfh.read()
validation_error = validator.validate_str(test_data)
print(str(validation_error))
This message is 12423 characters long - 1 line.
I would expect the errors returned/raised by CycloneDX something like below:
class ValidationError:
# abstract class
data: Any
"raw problem, for debugging"
path: str
message: str
class XmlValidationError(ValidationError):
# this subclass knows what data is
@property
def path(self):
return self.data.path
@property
def message(self):
return self.data.message
class JsonValidationError(ValidationError):
# this subclass knows what data is
@property
def path(self):
return self.data.json_path
@property
def message(self):
# ensures the error is transformed to something sensible
# resolving a problem caused by using jsonscheme for CycloneDX users
instance = repr(self.data.instance)
return self.data.message.replace(instance, shortened(instance))
# where shortened(long_text) ~ 'first n ... last n', that is the middle of the string replaced
# this would still add some context, but it will be safe to display
These would provide a stable abstraction over generally useful validation error properties, and also hide implementation details from users, like third party objects lxml.etree._LogEntry and jsonschema.exceptions.ValidationError. The above proposal is also backward compatible, keeping data intact, if someone depends on it.
This python library is a community effort. Feel free to donate improvements via pullrequeasts. Here are our contribution guidelines: https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/CONTRIBUTING.md
PS: please be aware that the underlying validation libraries are intended to be exchangeable - don't depend too much on their infra.
let's discuss how a useful validation.ValidationError could look like. (dont care for feasibility right now; let's share our dreams/wishes and tell user stories)
I'll go first - just a very minimalistic outline of something that i would love to have
class ValidationError:
"""Validation failed with this specific error."""
def __init__(self, message: str) -> None:
self._message = message
@property
def message () -> str:
"""Humanreadable, brief error message"""
# exameples for XML error: `invalid value "foo" for enum field /bom/components[13]/@type`
return self._message
def __repr__(self) -> str:
... # the usual stuff
def __str__(self) -> str:
... # the usual stuff
class XmlValidationError(ValidationError):
pass
class JsonValidationError(ValidationError):
pass
this is just my kick-off - not a final proposal :-) Please post your ideas and examples in the comments below.
PS: I would really be interested what the user story for path. what are you going to do with it, if you had it?
This is very similar to what was proposed above, except for the path attribute, so I have nothing else to add to this interface.
The message implementation is however non-trivial, as it is not clear how to reduce the size for human consumption (I have edited the description to include code samples using test data triggering some longish messages).
path is not special - it is just information, which might or might not be present in the string representation message, but is very useful info for developers: some tool must have created the bad sbom, and path shows where the problem is.
In an ideal world we could happily use just message, but we might have more type specific useful info to expose (e.g. in the XML case there seems to be a line property as well).
Is there any user story for the path and line except being informational for debugging?
I mean, is there some automatism that would need the path/line in an atomic field, and if so, what's the user story behind it?
For me, the user story is this: this is an error object, that is created, when the input document has a problem and is rejected. This object may hold information that can be used to help in fixing either the document or the process leading to the document. Path and line are information that can be presented as part of the message to the user who will do this fixing. They may use it for debugging either the document or some tool that produced the invalid SBOM document.
The library provided message may be good for 80% of cases (or more), but maybe not all, and if it is in the not so helpful case, it may be better to build a new end-user message from the error properties, than use the library provided one. I can not give you examples when the library provided shortened message will not be sufficient, as I have no data yet, however the code I am working on already extends the messages with the path info.
Please also note, that it is not necessarily the same group of people who are creating the sbom document (and performing the debugging on rejected ones) and who are using the cyclonedx library.
I see. I will dig into the validator libraries in use, and see if i can collaborate with the respective maintainers in case the needed information is not publicly accessible.