marshmallow
marshmallow copied to clipboard
Wrapped `validates_schema` method is not called when `.load` is provided with a generator.
Issue
When passing a generator to .load
, the wrapped validates_schema
method is never called.
Expected
When passing a generator to .load
, it should behave the same as passing a list or tuple and call the method wrapped by validates_schema
.
Environment
python v3.9.5 marshmallow 3.13.0
Example
>>> from marshmallow import validates_schema, Schema, fields
>>> class MySchema(Schema):
... fld = fields.String()
... @validates_schema()
... def _validate(self, data, **kwargs):
... print('validate schema called')
>>> MySchema().load(({'fld': 'abc'}, {'fld': 'def'}), many=True)
validate schema called
validate schema called
[{'fld': 'abc'}, {'fld': 'def'}]
>>> def gen(): yield from ({'fld': 'abc'}, {'fld': 'def'})
>>> MySchema().load(gen(), many=True)
[{'fld': 'abc'}, {'fld': 'def'}]
I don't think this argument is intended to accept generator iterators. The Iterable
annotation appears to be a documentation regression introduced in https://github.com/marshmallow-code/marshmallow/pull/1488. The annotation should have be Sequence
.
That still poses the risk of the validator accepting a generator at runtime and not behaving as one would expect. Is it viable to implement an assertion or a warning if a generator is provided so that there is some form of runtime guard?
Oh, there is a validator and it does explicitly allow any iterable.
https://github.com/marshmallow-code/marshmallow/blob/e9f151f3afdd82c5e74e45a36cc693d1ffeff5da/src/marshmallow/utils.py#L48-L55
I suspect we would want to wrap it in list
rather than tee
ing it. It looks like dump()
has the same issue.
I started on a patch to do the list()
invocation, but I'm hesitating to submit it. The following change passes all tests:
diff --git a/src/marshmallow/schema.py b/src/marshmallow/schema.py
index ff119a4..449a45c 100644
--- a/src/marshmallow/schema.py
+++ b/src/marshmallow/schema.py
@@ -836,11 +836,22 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
unknown = unknown or self.unknown
if partial is None:
partial = self.partial
+ # before preprocessors, convert data to a list if it is an iterable and many=True
+ # this ensures that if a generator is used, it will be iterated only once
+ processed_data: typing.Mapping[str, typing.Any] | list[
+ typing.Mapping[str, typing.Any]
+ ] = data
+ if is_collection(data) and many:
+ processed_data = list(data)
# Run preprocessors
if self._has_processors(PRE_LOAD):
try:
processed_data = self._invoke_load_processors(
- PRE_LOAD, data, many=many, original_data=data, partial=partial
+ PRE_LOAD,
+ processed_data,
+ many=many,
+ original_data=data,
+ partial=partial,
)
except ValidationError as err:
errors = err.normalized_messages()
But this means that if I have a pre_load(many=True)
hook which is meant to consume a special collection type, you'll get a list
instead. So it's a potentially breaking change in order to support a usage which I don't think was really meant to be supported.
I would prefer updating the annotation to take Sequence
, tweak any docs appropriately, and treat this as an annotation inaccuracy. No runtime behavior changes.
As for checking and rejecting generators as inputs... we could, but you can also use other types which implement Iterator but not Sequence. I'm not convinced that an explicit is_generator
check adds much value, even though it's easy to add.