cerberus icon indicating copy to clipboard operation
cerberus copied to clipboard

Proposal: Add document level validation

Open biscuit314 opened this issue 6 years ago • 2 comments

Use-case abstract

Sometimes a document should be considered invalid based on the values of multiple fields. For example, in a medical record where gender==male and is_pregnant==True is not valid, even though each field looked at individually complies with the schema ('allowed': ['male', 'female'] and 'type': 'boolean' respectively).


Feature request

Some documents as a whole should be considered invalid based on more than one field. Take this schema, for example:

schema = {
    'name': {'type': 'string'},
    'aka': {'type': 'string'},
    'gender': {
        'type': 'string',
        'allowed': ['male', 'female']
    },
    'is_pregnant': {'type': 'boolean'}
}

This document should be invalid because males can't be pregnant (even if they have the right to be):

medical_record = {
    'name': 'Stan',
    'aka': 'Loretta',
    'gender': 'male',
    'is_pregnant': True
}

(with apologies to Life of Brian)

This can be accomplished currently with a custom validator like this:

class MyValidator(Validator):
    def _validate_preg_check(self, preg_check, field, value):
        """ {'type': 'boolean'} """
        if field == 'gender':
            gender = value
            is_pregnant = self.document.get('is_pregnant')
        elif field == 'is_pregnant':
            gender = self.document.get('gender')
            is_pregnant = value
        else:
            return

        is_valid = True
        if is_pregnant:
            is_valid = gender == 'female'
        if not(preg_check and is_valid):
            self._error(field, 'Invalid: males cannot be pregnant')

Now we add the rules to the schema:

schema = {
    'name': {'type': 'string'},
    'aka': {'type': 'string'},
    'gender': {
        'type': 'string',
        'allowed': ['male', 'female'],
        'preg_check': True
    },
    'is_pregnant': {
      'type': 'boolean', 
      'preg_check': True
    }
}

and validation fails as we expect...

validator = MyValidator(schema)
validator.validate(medical_record)

...with the following errors dict:

{
  'gender': ['Invalid: males cannot be pregnant'], 
  'is_pregnant': ['Invalid: males cannot be pregnant']
}

What's wrong with this?

  • have to apply the rule to each field involved in the check
  • therefore, the rule fires for each field when it needs to only fire once
  • the custom validator has to have an if/elif/else to cover all fields involved
  • the errors dict doesn't cleanly tell the story. There is nothing wrong with either the gender field nor the is_pregnant field. The only problem is those two values together in the same document

This example involves only two fields. The problems compound when there are many more fields and many more document level checks.

Better is a simpler custom validator that looks like this (with the if/elif/else gone):

class MyValidator(Validator):
    def _validate_document_preg_check(self, preg_check, field, value):
        """ {'type': 'boolean'} """
        is_valid = True
        if self.document.get('is_pregnant'):
            is_valid = self.document.get('gender') == 'female'
        if not(preg_check and is_valid):
            self._error(field, 'This medical record is invalid')

Now the schema can return to the original:

schema = {
    'name': {'type': 'string'},
    'aka': {'type': 'string'},
    'gender': {
        'type': 'string',
        'allowed': ['male', 'female']
    },
    'is_pregnant': {'type': 'boolean'}
}

This validation also fails as we expect...

validator = MyValidator(schema, document_validations={'preg_check': True})
validator.validate(medical_record)

...but this time with the following errors dict:

{
  '_document': ['Invalid: males cannot be pregnant']
}

biscuit314 avatar Sep 16 '18 17:09 biscuit314

Hi @funkyfuture - I was looking over this since I have a use case for some simple document level validation and I have a few questions which you may be able to answer to help guide me in a reasonable implementation.

  1. My understanding is that Cerberus can't currently do this kind of thing, except via abuse of custom field validators as described in the summary of this issue. Is that correct?
  2. Does this feature still seem like a reasonable one to you now?
  3. Would you be interested in an implementation that took a "batteries included" approach? ie. Provide a DocumentValidator variant of Validator which not only respects the current schema rules but specifically implements things like validating that:
    • specified fields have the same value (foo and bar must have the same value)
    • specified fields have specific combinations of values (if foo is 42, bar must be -1)
    • specified fields (or combinations of fields) are mutually excluded (excludes but with combinations) (foo and bar cannot both be present)
    • I'd envision this being used with something like:
schema = {"foo": {"type": "int"}, "bar": {"type": "int"}, "baz": {"type": "int"}}
document_rules = [
  {"rule": "equal", "fields": ["foo", "bar"]},  # if foo and bar are present, they must have equal values
  {"rule": "mutex", "fields": [["foo", "bar"], "baz"]},  # if foo and bar are present, baz cannot be
]
v = DocumentValidator(schema, document_rules)
  1. Would it be appropriate to implement this as a subclass (or mixin) which provides DocumentValidator?
  2. Should the actual implementation logic live in base.py or could it live in validation.py?

I'm happy to have a discussion here and/or make a new issue to track this idea if you'd prefer.

Edit: Ah I do see that excludes exists already as well. Amended above.

maybe-sybr avatar Jan 18 '21 03:01 maybe-sybr

Hi maybe-sybr,

I haven't looked at this in a while. My team forked Eve/Cerberus to continue with this feature and some other bug fixes.

This PR was not picked up because we had bundled it with other changes. I hope to return to this perhaps by the summer. We'll unbundle it then resubmit the PR. We have some other changes we may submit too.

  • Michael Ottoson

From: "maybe-sybr" [email protected] To: "pyeve/cerberus" [email protected] Cc: "michael" [email protected], "Author" [email protected] Sent: Sunday, January 17, 2021 10:08:05 PM Subject: Re: [pyeve/cerberus] Proposal: Add document level validation (#444)

Hi [ https://github.com/funkyfuture | @funkyfuture ] - I was looking over this since I have a use case for some simple document level validation and I have a few questions which you may be able to answer to help guide me in a reasonable implementation.

1. My understanding is that Cerberus can't currently do this kind of thing, except via abuse of custom field validators as described in the summary of this issue. Is that correct? 
2. Does this feature still seem like a reasonable one to you now? 
3. Would you be interested in an implementation that took a "batteries included" approach? ie. Provide a DocumentValidator variant of Validator which not only respects the current schema rules but specifically implements things like validating that: 
    * specified fields have the same value (foo and bar must have the same value) 
    * specified fields have specific combinations of values (if foo is 42, bar must be -1) 
    * specified fields (or combinations of fields) are mutually excluded (the opposite of dependencies ) (foo and bar cannot both be present) 
    * I'd envision this being used with something like: 

schema = { "foo" : { "type" : "int" }, "bar" : { "type" : "int" }, "baz" : { "type" : "int" }} document_rules = [ { "rule" : "equal" , "fields" : [ "foo" , "bar" ]}, # if foo and bar are present, they must have equal values { "rule" : "mutex" , "fields" : [[ "foo" , "bar" ], "baz" ]}, # if foo and bar are present, baz cannot be ] v = DocumentValidator ( schema , document_rules )

1. Would it be appropriate to implement this as a subclass (or mixin) which provides DocumentValidator ? 
2. Should the actual implementation logic live in base.py or could it live in validation.py ? 

I'm happy to have a discussion here and/or make a new issue to track this idea if you'd prefer.

— You are receiving this because you authored the thread. Reply to this email directly, [ https://github.com/pyeve/cerberus/issues/444#issuecomment-761950186 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ACD25627KWLVUX54VL2V2KDS2OQZLANCNFSM4FVL3L7A | unsubscribe ] .

biscuit314 avatar Jan 25 '21 14:01 biscuit314

as we don't intend to add new features to the API, i'm closing this issue.

funkyfuture avatar Jul 23 '23 15:07 funkyfuture