connexion
connexion copied to clipboard
Handling streaming request data
Description
Using just Flask, a user can (manually) use the request.input_stream
to handle streaming uploads. The use case is because I need to handle binary uploads that won't fit in memory. I use the JSON validation features of connexion
when the Content-Type is application/json
but in this case I have a application/octet-stream
.
With the connexion
library I can't seem to work out how to access the raw data stream - it looks like the data all gets read in FlaskApi.get_request from EndOfRequestLifecycleDecorator
- which is happening well before calling any user defined handler function.
Is it possible to not wrap the flask.request
with ConnexionRequest
for some endpoints?
Expected behaviour
For an unrecognized content-type, or for a specific binary stream, connexion should - like Flask - not read()
the full request body before calling a handler. Instead the stream should be pulled on demand - usually when a user calls request.data()
or request.json()
Actual behaviour
Deep in the bowels of connexion the stream gets fully read and I couldn't see how to disable it for one endpoint.
Steps to reproduce
Upload a file e.g. with requests
:
file_path = 'large.bin'
with open(file_path, 'rb') as f:
r = requests.post(
url + '/upload',
headers={
'Content-Type': 'application/octet-stream',
'Content-Length': '123456789'
},
data=f.read()
)
On the server side:
if headers['Content-Type'] == "application/octet-stream":
# This should work
data = request.input_stream.read()
print(len(data))
But the input_stream is empty because connexion has already slurped it up - which lead to problems if we are dealing with uploads that won't fit in memory (obviously I don't normally just read the stream and print the length)
Additional info:
-
python --version
3.7 (in alpine docker image) -
pip show connexion | grep "^Version\:"
1.4
Thanks for reporting, I don't have clear insights, but I could think about an Swagger extension to disable validation or something similar to prevent slurping the input stream.
A simpler solution might be to inspect the Content-Type
- if it shows that validation couldn't possibly do anything useful IMO connexion
should try not to change any state at all.
what do you think about #760 ?
I thought I had the same question as the original OP, but now I'm not sure. Let's say even in the case of content type of JSON, if I upload a huge (say 100Gb geojson) object, does connexion expose some equivalent of connexion.request.json
for streaming data?
I agree that is different from my original use case but related and equally valid. If the upload (json or not) is too large to fit in memory (and therefore can't connexion can't validate it in one hit) we need a way to access the raw data stream.
@hardbyte if we can get down to the "flask level", there is an option for non-RAM large uploads: http://flask.pocoo.org/docs/1.0/patterns/fileuploads/#improving-uploads
So the trick here is figuring out how to turn off validation (maybe using a custom validator?) and then figuring out how to somehow get to the flask level to use the above.
I'm open to any ideas in the short term.
EDIT: @hjacobs @hardbyte do you know if routing JSON to a custom validator that simply does nothing would work, or is that "already too late", meaning, still the entire upload would already have been tried to be loaded into memory?
+1
Hi @hardbyte, I think my issue https://github.com/zalando/connexion/issues/1332 is related, did you find a way to solve yours ? I tried to find a decorator to bypass but I couldn't figure it out
Not really, I ended up changing that API to allow uploads via S3/Minio.
As for now we patched internally with something like this:
content_type = flask_request.headers.get("Content-Type")
# [...]
body = flask_request.stream if content_type == "application/octet-stream" else flask_request.get_data()
# [...]
in https://github.com/zalando/connexion/blob/master/connexion/apis/flask_api.py#L235
Probably one can apply the same logic when multipart/form-data
Fixed since https://github.com/spec-first/connexion/pull/1618