schematics Allow passing a callable to `choices`

As a developer I want the ability to avoid generating the list of choices for a type during class definition (which is usually at module scope) as doing so for all models might be expensive. This pull request makes choices behave like default. As with default, users should take care to do their own caching of choices if the computation of these values is expensive.

May 13 '16 23:05 chadrik

Okay, the patch looks good. I'm curious, what would be an example of an expensive choices computation?

I've always considered initialization performance to be fairly unimportant relative to runtime performance, but if choices caching is a worthwhile optimization then I need to change that assumption.

May 14 '16 08:05 bintoro

Examples could be anything populated dynamically, like checking a database or file system for a list of registered options.

I didn't add any caching of the values returned by the callable because there is a chance that users will want the choices to refresh on subsequent calls, as well as to keep the behavior with how default works, and also because it's easy enough to add caching to your choices callable yourself if that is the behavior that you want.

May 14 '16 14:05 chadrik

@bintoro so what do you think?

May 25 '16 18:05 chadrik

Right, sorry, I was waiting to see if anyone else would like to chime in.

I've been thinking about use cases for dynamic choices, and I would expect a pretty common scenario to be one where the set of valid choices depends on another field on the model. But that case cannot be addressed with choices; you have to use a model-level validator anyway:

def validate_foo(cls, data, value, context):
    ... # Compute choices.
    if value not in choices:
        raise ValidationError(...)

That's already pretty succinct. Using a callable as choices may be slightly more convenient but gives you a less powerful version (without access to data). If the set of choices is large, you likely don't want to replicate it in the error message, which means you'll have to add messages to accompany choices or make a custom type. At that point, one must ask if a validator method might in fact be the cleaner approach.

So basically my concern is that perhaps choices isn't the right tool for the job if the job is complicated. Callables would only accommodate a rather specific case where the list of choices cannot be computed in advance but the thing it depends on is external to the model.

Please don't take this as steadfast opposition. It's just that there have been PRs that were accepted with too little review IMO, and I'd rather vet everything thoroughly in the first place. I'm on the fence on this because personally I don't see it as very useful and it adds a bit of processing overhead for every field. I'm hoping one of the other owners would cast a vote one way or the other.

May 25 '16 23:05 bintoro

In your example, validate_foo would have to be implemented as a method on a Model subclass, correct?

Your counter-point makes sense, but it's problematic for me because one of my requirements is that the list of choices be inspectable from outside the model. This is necessary for doing things like building a json-schema document or auto-generating a UI based on a model. This would not be possible to do reliably with model-level validators, or loose validator functions passed to __init__ because the choices would be contained within an anonymous callable.

Then there's obviously the simplicity factor. It's nice to just pass a callable and be done with it. The current implementation doesn't add much complexity to the code, though I understand your POV that all these niche PRs can add up to a lot of extra cruft.

While I still like my idea, it's only fair to explore alternatives. For example, in marshmallow all validation is done via validation classes, be it regular expression, choices, or min/max ranges. There's a logic to that, but I prefer schematic's simplicity. e.g. I prefer this:

StringType(regex='[a-z]+')

..to this:

StringType(validators=[MatchRegex('[a-z]+')])

And the same logic applies to choices. Also, I don't like that with marshmallow you can provide a validator instance to a field type that doesn't make sense (e.g. regex validator to a numeric type).

All that said, the reason I suggested using classes is that it allows us to provide even more information to a validation (e.g. human-readable labels for choices, or an option to cache callable results) while still keeping the validations inspectable (e.g. by searching for a Choices instance in type_instance.validators):

class MyModel(Model):
    foo = StringType(
        validators=[Choices([1, 2, 3], labels=['one', 'two', 'three'])])
    bar = StringType(
        validators=[Choices(my_func, caching=True)])

So, if this PR is rejected I can always use this technique. However, it might be nice to provide a base Validator class in schematics as a convenience, and maybe even a Choices class with some documentation explaining the additional features. It provides both some utility and a working example of how to solve problems like this.

Anyway, no worries about letting this wait for feedback.

May 26 '16 16:05 chadrik

Great analysis. I can appreciate the introspection issue, and it does rule out a model-level validator as a substitute.

The validator class approach is appealing because of how general it is. It gave me an idea:

class Choices(FieldValidator):

    def __init__(self, elements, message):
        if not isinstance(elements, (Iterable, Callable)):
            raise TypeError
        self._elements = elements
        self.message = message

    @property
    def elements(self):
        if callable(self._elements):
            return self._elements()
        else:
            return self._elements

    def __call__(self, value, context):
        if value not in self.elements:
            raise ValidationError(self.message.format(str(self.elements)))

    def __contains__(self, element):
        return element in self.elements

    def __iter__(self):
        return iter(self.elements)

    def __len__(self):
        return len(self.elements)

Then, in BaseType.__init__:

if choices:
    self.choices = Choices(choices, self.messages['choices'])
    self.validators.append(self.choices)
else:
    self.choices = None

(A Choices instance would also be a valid input for choices, in which case it would be used as such.)

This would buy us a few things:

A callable would be OK for choices, but the handler would be tucked away in the Choices class.
We'd still get the extension capabilities you talked about, like caching.
The validator object would always be accessible at BaseType.choices (as opposed to sometimes appearing only in the validators list).
BaseType.validate_choices would go away completely, eliminating the current overhead of calling it for every field.

And, since Choices implements __iter__, we'd also fulfil the basic contract that BaseType.choices exposes the choices as an iterable.

Does this make sense?

EDIT: Removed references to set because compound types can have choices too.

May 27 '16 15:05 bintoro

I like this plan, and it warrants some more exploration.

Firstly, It might be a bit odd to use this approach in only one place. I think we should consider what we might gain from using the FieldValidator elsewhere. For example:

StringType.regex: an optional human-readable description of the regex to add to the failure message. It also gives us a good place to cache the compiled regex, if we want to add that feature.
NumberType/DecimalType.min_value, NumberType/DecimalType.max_value: could add inclusive boolean, for controlling > vs >=. also allows code reuse between NumberType and DecimalType.

The problem here is that I don't think we can reasonably make each of these validators behave like the data type they proxy. i.e. I doubt we want to do this:

class Regex(FieldValidator, str):
    ...

Next up, the placement of the messages in the class MESSAGES attribute complicates things when passing a Choices object. For example, this is not so pretty:

class MyModel(Model):
    foo = StringType(
        choices=Choices([1, 2, 3], message=BaseType.messages['choices']))

It makes sense for the canonical location for the choices message to be on the Choices validator, which would make passing in the message unnecessary, but I'm not sure if that's too big of a breaking change to make.

May 31 '16 22:05 chadrik

Hi all, I would love to get this branch merged, so I can stop using a custom build of schematics. I re-read the thread, and after some consideration I still like my original proposal for its simplicity.

Let me summarize what I think are the key points of this PR:

Use cases would be anything that should be deferred until model instantiation for performance or timing reasons, like checking a database or file system for a list of registered options.
Alternatives to the original proposal either require a lot of work (validator classes), or are inconvenient (creating a validator method, using a kind of lazy container to wrap the callable).
There is already precedent for allowing callables as an alternative to a static value: defaults.

I'll resolve the conflicts once we make up our mind. Let me know what you think.

Jul 13 '18 18:07 chadrik

I agree with your original patch idea, it's simple enough to merge and backwards compatible.

Jul 14 '18 17:07 lkraider