marshmallow icon indicating copy to clipboard operation
marshmallow copied to clipboard

Wrapped `validates_schema` method is not called when `.load` is provided with a generator.

Open ziplokk1 opened this issue 2 years ago • 4 comments

Issue

When passing a generator to .load, the wrapped validates_schema method is never called.

Expected

When passing a generator to .load, it should behave the same as passing a list or tuple and call the method wrapped by validates_schema.

Environment

python v3.9.5 marshmallow 3.13.0

Example

>>> from marshmallow import validates_schema, Schema, fields
>>> class MySchema(Schema):
...     fld = fields.String()
...     @validates_schema()
...     def _validate(self, data, **kwargs):
...         print('validate schema called')
>>> MySchema().load(({'fld': 'abc'}, {'fld': 'def'}), many=True)
validate schema called
validate schema called
[{'fld': 'abc'}, {'fld': 'def'}]
>>> def gen(): yield from ({'fld': 'abc'}, {'fld': 'def'})
>>> MySchema().load(gen(), many=True)
[{'fld': 'abc'}, {'fld': 'def'}]

ziplokk1 avatar Oct 28 '21 21:10 ziplokk1

I don't think this argument is intended to accept generator iterators. The Iterable annotation appears to be a documentation regression introduced in https://github.com/marshmallow-code/marshmallow/pull/1488. The annotation should have be Sequence.

deckar01 avatar Nov 11 '21 20:11 deckar01

That still poses the risk of the validator accepting a generator at runtime and not behaving as one would expect. Is it viable to implement an assertion or a warning if a generator is provided so that there is some form of runtime guard?

ziplokk1 avatar Nov 11 '21 21:11 ziplokk1

Oh, there is a validator and it does explicitly allow any iterable.

https://github.com/marshmallow-code/marshmallow/blob/e9f151f3afdd82c5e74e45a36cc693d1ffeff5da/src/marshmallow/utils.py#L48-L55

I suspect we would want to wrap it in list rather than teeing it. It looks like dump() has the same issue.

deckar01 avatar Nov 11 '21 21:11 deckar01

I started on a patch to do the list() invocation, but I'm hesitating to submit it. The following change passes all tests:

diff --git a/src/marshmallow/schema.py b/src/marshmallow/schema.py
index ff119a4..449a45c 100644
--- a/src/marshmallow/schema.py
+++ b/src/marshmallow/schema.py
@@ -836,11 +836,22 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
         unknown = unknown or self.unknown
         if partial is None:
             partial = self.partial
+        # before preprocessors, convert data to a list if it is an iterable and many=True
+        # this ensures that if a generator is used, it will be iterated only once
+        processed_data: typing.Mapping[str, typing.Any] | list[
+            typing.Mapping[str, typing.Any]
+        ] = data
+        if is_collection(data) and many:
+            processed_data = list(data)
         # Run preprocessors
         if self._has_processors(PRE_LOAD):
             try:
                 processed_data = self._invoke_load_processors(
-                    PRE_LOAD, data, many=many, original_data=data, partial=partial
+                    PRE_LOAD,
+                    processed_data,
+                    many=many,
+                    original_data=data,
+                    partial=partial,
                 )
             except ValidationError as err:
                 errors = err.normalized_messages()

But this means that if I have a pre_load(many=True) hook which is meant to consume a special collection type, you'll get a list instead. So it's a potentially breaking change in order to support a usage which I don't think was really meant to be supported.

I would prefer updating the annotation to take Sequence, tweak any docs appropriately, and treat this as an annotation inaccuracy. No runtime behavior changes.

As for checking and rejecting generators as inputs... we could, but you can also use other types which implement Iterator but not Sequence. I'm not convinced that an explicit is_generator check adds much value, even though it's easy to add.

sirosen avatar May 26 '22 14:05 sirosen