flask-rebar
flask-rebar copied to clipboard
streaming response bodies for many=True schemas
It looks like request handlers that have response_body_schema=Foo(many=True)
needlessly build up the entire list of Foo
s in memory before even starting to send the first Foo
back to the client.
Just quickly searched through the Marshmallow tracker and didn't see anything about this, but then I searched through the code and I think marshmallow.Schema.dump()
is the culprit:
if many and is_iterable_but_not_string(obj):
obj = list(obj)
Of course, the request handler can require paging and enforce max page sizes etc., but it'd still be better if response bodies for JSON arrays could be streamed given how common many=True
response bodies are. The stdlib's json module is perfectly capable of streaming, and there's no benefit to building these lists up in memory before starting to send the response. Am I missing something, or is this just a gap in Marshmallow? If so, it seems pretty easy for Rebar to work around this as long as the gap exists. Thanks!
This would be an amazing enhancement, but it's a doozy.
- Looking over Marshmallow it seems like there's a lot of built in assumptions that you are dumping everything at once. I'm not sure how you'd have dump return a generator without breaking existing pre/post dump processors that do things like add envelops. 😞
2)We're using Flask's jsonify to dump the marshalled objects. It does not appear to support generators, so you'd have to implement a version that does handle them, which seems like it's got a lot of gotchas. (I think this might not actually be as easy as passing the Response.stream property into json.dump, but having never tried that 🤷♂ )
Can Rebar work around (1) by detecting a many=True
Schema, converting it to a many=False
Schema, and using that to dump()
an item at a time, yielding each dumped result from within a generator?
Agreed that (2) is annoying but doesn't seem like a show-stopper.
Also re (2), I think the implementation in that post could be simplified a lot. Something along the lines of
from json import dumps
def dump_streaming(iterable):
yield "["
it = iter(iterable)
i = next(it, None)
while i is not None:
yield dumps(i)
i = next(it, None)
if i is not None:
yield ","
yield "]"
>>> ''.join(dump_streaming(range(0)))
'[]'
>>> ''.join(dump_streaming(range(1)))
'[0]'
>>> ''.join(dump_streaming(range(2)))
'[0,1]'
>>> ''.join(dump_streaming(range(3)))
'[0,1,2]'
Opened an issue in Marshmallow for this FWIW: marshmallow-code/marshmallow#1696