connexion icon indicating copy to clipboard operation
connexion copied to clipboard

Mark endpoints for streaming uploads

Open tst-lsavoie opened this issue 3 years ago • 6 comments

Fixes #592 based on the suggestion by @MatteoRagni: https://github.com/zalando/connexion/issues/592#issuecomment-775153162. Also Fixes #1332, I think.

Changes proposed in this pull request:

  • Adds an x-stream-upload option for requests. When this is specified, connexion will pass flask.request.stream as the request.body instead of calling get_data(). This allows streaming uploads to disk instead of reading the entire file into memory.

tst-lsavoie avatar Jun 25 '21 21:06 tst-lsavoie

Hi @tst-lsavoie, thanks for the PR.

  • I'm a bit hesitant to define a custom extension. Is there a reason we cannot inspect Content-Type instead?
  • We moved to the main branch. Can you rebase or merge and change the target branch of the PR?

RobbeSneyders avatar Jun 30 '21 23:06 RobbeSneyders

@RobbeSneyders I didn't use the Content-Type because it's less reliable since a client could leave the content type out or set it incorrectly. There's also the question of which content types should be streamed, which might be different for each application, or even different for different endpoints in the same application. I don't love the idea of adding a new extension either but it seemed like the least bad option.

I changed the base branch to main but it looks like there are non-trivial merge conflicts I need to figure out. I'm going to hold off on that until we have a clearer picture of how the PR will need to change.

tst-lsavoie avatar Jul 01 '21 21:07 tst-lsavoie

I've been looking into this a bit more, and I would suggest the following solution.

Define a body property with body_getter on ConnexionRequest, similar to the json property with json_getter.

https://github.com/zalando/connexion/blob/f46551cee8056dca65c1177e8252f19805d95d50/connexion/lifecycle.py#L25-L27

We can then assign request.get_data as the boddy_getter for Flask, which would delay stream consumption until the body property is evaluated. There's only two places in Connexion where request.body is evaluated:

  1. During request body validation. Here the body is only consumed if Connexion supports validation for the provided Content-Type, which is currently only for json or form: https://github.com/zalando/connexion/blob/f46551cee8056dca65c1177e8252f19805d95d50/connexion/decorators/validation.py#L137-L190

  2. During parameter parsing. Here the body is currently always consumed: https://github.com/zalando/connexion/blob/f46551cee8056dca65c1177e8252f19805d95d50/connexion/decorators/parameter.py#L79-L84 But it's actually not used if the body parameter is not defined in the view function parameters: https://github.com/zalando/connexion/blob/f46551cee8056dca65c1177e8252f19805d95d50/connexion/operations/swagger2.py#L277-L282 If we refactor the parameter parsing a bit, we can prevent body consumption here as well if the body parameter is not present in the view function parameters.

You can then prevent stream consumption in connexion by

  • Defining a Content-Type for which Connexion does not support validation, this should be the case for all streaming types
  • Not defining a body parameter in the view function

WDYT?

RobbeSneyders avatar Jul 03 '21 14:07 RobbeSneyders

You can find a quick and dirty PoC of the solution here: https://github.com/thermopylae/connexion/compare/mark-endpoints-for-streaming-uploads...spec-first:streaming-uploads-poc

RobbeSneyders avatar Jul 03 '21 14:07 RobbeSneyders

I think I understand the proposed solution. So, if the content type was not form or JSON, and if the user didn't add a parameter for the body to their function, then the stream would be available.

I could see that working. I worry about it for the future, though. One small, innocent looking update that accesses the request body could break it. Hopefully testing would catch that, but it's not guaranteed.

We'd have to do some refactoring for the OpenAPI 3 version of the code as well: https://github.com/zalando/connexion/blob/c014299435990ff1d4c20596396ca563f7b266b4/connexion/operations/openapi.py#L270-L300

The biggest problem is going to be time. I'm not sure when I'll next have a good chunk of time to focus on this issue.

tst-lsavoie avatar Jul 07 '21 23:07 tst-lsavoie

This is great!

I have used it marking my endpoint with x-stream-upload and work like a charm, thanks :)

clementperon avatar Mar 24 '22 16:03 clementperon

Fixed by https://github.com/spec-first/connexion/pull/1618

RobbeSneyders avatar Feb 18 '23 10:02 RobbeSneyders