connexion icon indicating copy to clipboard operation
connexion copied to clipboard

Handling streaming request data

Open hardbyte opened this issue 6 years ago • 10 comments

Description

Using just Flask, a user can (manually) use the request.input_stream to handle streaming uploads. The use case is because I need to handle binary uploads that won't fit in memory. I use the JSON validation features of connexion when the Content-Type is application/json but in this case I have a application/octet-stream.

With the connexion library I can't seem to work out how to access the raw data stream - it looks like the data all gets read in FlaskApi.get_request from EndOfRequestLifecycleDecorator - which is happening well before calling any user defined handler function.

Is it possible to not wrap the flask.request with ConnexionRequest for some endpoints?

Expected behaviour

For an unrecognized content-type, or for a specific binary stream, connexion should - like Flask - not read() the full request body before calling a handler. Instead the stream should be pulled on demand - usually when a user calls request.data() or request.json()

Actual behaviour

Deep in the bowels of connexion the stream gets fully read and I couldn't see how to disable it for one endpoint.

Steps to reproduce

Upload a file e.g. with requests:

file_path = 'large.bin'
with open(file_path, 'rb') as f:
    r = requests.post(
        url + '/upload',
        headers={
            'Content-Type': 'application/octet-stream',
            'Content-Length': '123456789'
        },
        data=f.read()
    )

On the server side:

if headers['Content-Type'] == "application/octet-stream":
        # This should work
        data = request.input_stream.read()
        print(len(data))

But the input_stream is empty because connexion has already slurped it up - which lead to problems if we are dealing with uploads that won't fit in memory (obviously I don't normally just read the stream and print the length)

Additional info:

  • python --version 3.7 (in alpine docker image)
  • pip show connexion | grep "^Version\:" 1.4

hardbyte avatar Jun 04 '18 10:06 hardbyte

Thanks for reporting, I don't have clear insights, but I could think about an Swagger extension to disable validation or something similar to prevent slurping the input stream.

hjacobs avatar Jun 05 '18 17:06 hjacobs

A simpler solution might be to inspect the Content-Type - if it shows that validation couldn't possibly do anything useful IMO connexion should try not to change any state at all.

hardbyte avatar Nov 08 '18 11:11 hardbyte

what do you think about #760 ?

dtkav avatar Nov 08 '18 16:11 dtkav

I thought I had the same question as the original OP, but now I'm not sure. Let's say even in the case of content type of JSON, if I upload a huge (say 100Gb geojson) object, does connexion expose some equivalent of connexion.request.json for streaming data?

tommyjcarpenter avatar Jan 14 '19 21:01 tommyjcarpenter

I agree that is different from my original use case but related and equally valid. If the upload (json or not) is too large to fit in memory (and therefore can't connexion can't validate it in one hit) we need a way to access the raw data stream.

hardbyte avatar Jan 15 '19 06:01 hardbyte

@hardbyte if we can get down to the "flask level", there is an option for non-RAM large uploads: http://flask.pocoo.org/docs/1.0/patterns/fileuploads/#improving-uploads

So the trick here is figuring out how to turn off validation (maybe using a custom validator?) and then figuring out how to somehow get to the flask level to use the above.

I'm open to any ideas in the short term.

EDIT: @hjacobs @hardbyte do you know if routing JSON to a custom validator that simply does nothing would work, or is that "already too late", meaning, still the entire upload would already have been tried to be loaded into memory?

tommyjcarpenter avatar Jan 15 '19 14:01 tommyjcarpenter

+1

klorenz avatar Apr 04 '19 03:04 klorenz

Hi @hardbyte, I think my issue https://github.com/zalando/connexion/issues/1332 is related, did you find a way to solve yours ? I tried to find a decorator to bypass but I couldn't figure it out

MajorSquirrelTVS avatar Jan 19 '21 15:01 MajorSquirrelTVS

Not really, I ended up changing that API to allow uploads via S3/Minio.

hardbyte avatar Jan 19 '21 18:01 hardbyte

As for now we patched internally with something like this:

content_type = flask_request.headers.get("Content-Type")
# [...]
  body = flask_request.stream if content_type == "application/octet-stream" else flask_request.get_data()
# [...]

in https://github.com/zalando/connexion/blob/master/connexion/apis/flask_api.py#L235

Probably one can apply the same logic when multipart/form-data

MatteoRagni avatar Feb 08 '21 13:02 MatteoRagni

Fixed since https://github.com/spec-first/connexion/pull/1618

RobbeSneyders avatar Feb 18 '23 10:02 RobbeSneyders