Zope icon indicating copy to clipboard operation
Zope copied to clipboard

RESPONSE.write does not stream data in wsgi

Open perrinjerome opened this issue 5 years ago • 19 comments

What seem to be a regression in the ZServer -> wsgi migration is that RESPONSE.write no longer send the response data to the client in a streaming way but just send everything when the published method returns.

For example this in a script python (with ID stream_test at the root of application):

request = container.REQUEST
response = request.RESPONSE

for i in range(1000):
    for x in range(1000):
        y = x ** x # this introduce a delay
    response.write(b'.')

if I use curl --no-buffer http://127.0.0.1:8080/stream_test, with ZPublisher we can see the . printed one after the other, but with wsgi we see all the ..... after a delay.

RESPONSE.write is (still) documented in zope book as a way to stream data: https://github.com/zopefoundation/Zope/blob/e9f10101b8a4db64a2440e3a51561a7805ea44f5/docs/zdgbook/ObjectPublishing.rst#L1258-L1267

It's also used in OFS.File to stream file chunks:

https://github.com/zopefoundation/Zope/blob/b9a346903fe4a6874face0930876184ac321853e/src/OFS/Image.py#L279-L298

WSGIResponse.write docstring also states that "This allows the browser to display partial results while computation of a response to proceed", but it's currently not the case.

perrinjerome avatar Jun 25 '19 02:06 perrinjerome

Jérome Perrin wrote at 2019-6-24 19:07 -0700:

What seem to be a regression in the ZServer -> wsgi migration is that RESPONSE.write no longer send the response data to the client in a streaming way but just send everything when the published method returns.

For example this in a script python (with ID stream_test at the root of application):

request = container.REQUEST
response = request.RESPONSE

for i in range(1000):
   for x in range(1000):
       y = x ** x # this introduce a delay
   response.write(b'.')

if I use curl --no-buffer http://127.0.0.1:8080/stream_test, with ZPublisher we can see the . printed one after the other, but with wsgi we see all the ..... after a delay.

RESPONSE.write is (still) documented in zope book as a way to stream data: https://github.com/zopefoundation/Zope/blob/e9f10101b8a4db64a2440e3a51561a7805ea44f5/docs/zdgbook/ObjectPublishing.rst#L1258-L1267

It's also used in OFS.File to stream file chunks:

https://github.com/zopefoundation/Zope/blob/b9a346903fe4a6874face0930876184ac321853e/src/OFS/Image.py#L279-L298

WSGIResponse.write docstring also states that "This allows the browser to display partial results while computation of a response to proceed", but it's currently not the case.

Your example requires a lot of support (which may not all be in place): In order to use Response.write, usually the final content-length must be known. The reason for this: even if the response is streamed, the response header must be produced before the body starts. Without a valid content length, the server must either use a chunked response or close the communication channel to indicate that the response is finished.

For efficiency reasons, the server (component) might cache some output before it decides what to do.

WSGI itself provides strong control over the delivery of response parts; thus, you can precisely control streaming in principle. However, WSGI support for Response.write requires a special (WSGI) interface method the use of which WSGI discourages. The ZPublisher integration may support only the typical use case for Response.write (avoid to construction of a huge structure in memory) and not precise streaming control.

Note, that in general Response.write should not wait until the chunk is delivered (what WSGI's streaming contol would enforce).

Note, that ZServer (unlike WSGI) does not provide precise streaming control. Response.write simply gives a body part to ZServer which this may deliver - at its discretion.

d-maurer avatar Jun 25 '19 05:06 d-maurer

Thanks for the feedback, I did not know that of that "discouraged interface". I assume this is the "deprecated method" from the examples in https://stackoverflow.com/a/16775731 , ie:

def application(environ, start_response):
    write = start_response(status, headers)
    write('content block 1')
    write('content block 2')
    write('content block 3')
    return None

If I try a quick and dirty patch doing something like this:

response.write = start_response(status, headers)
...

then the response data seems properly sent chunk by chunk with waitress and other things look working ... but that's a discouraged approach.

About content-length header, methods can set this header before starting to write, this is what OFS.Image does: https://github.com/zopefoundation/Zope/blob/b9a346903fe4a6874face0930876184ac321853e/src/OFS/Image.py#L264

Not sending a Content-Length header and letting the client wait for server to close connection is valid in my understanding of https://tools.ietf.org/html/rfc7230#section-3.3.3 .

The use case I have in mind is the one you could guess, prevent the server from having to keep the full structure in memory, like OFS.Image is doing. Sending data as chunk with the same level of detail as the chunk are written in response is not so important I guess.

I'm still wondering if there's a way for WSGIPublisher.publish_module to return a generator that would yield from what is "pushed" by response.write and then what the published method returns ... I assume this would be the recommended WSGI way.

perrinjerome avatar Jun 25 '19 07:06 perrinjerome

Hi @perrinjerome when I was going through the documentation at the Zope sprint last month I am pretty sure I tried to run the code in the example, but obviously I did not check the "streaming" feature. I am sorry!

Once a proper solution is found, please update the documentation accordingly.

@d-maurer Thanks for your explanation. In my Zope app, once there was a newsletter module, which streamed the Response and stopped working some time - even under Zope 2.13 - your hint may explain why. Meanwhile the newsletter module got deleted anyway.

jugmac00 avatar Jun 25 '19 08:06 jugmac00

I'm still wondering if there's a way for WSGIPublisher.publish_module to return a generator that would yield from what is "pushed" by response.write and then what the published method returns ... I assume this would be the recommended WSGI way.

I don't see how that can be done. I know nothing in Python to turn a classic call stack (from mapply to RESPONSE.write) into a kind of coroutine.

then the response data seems properly sent chunk by chunk with waitress and other things look working ... but that's a discouraged approach.

Then if there's no discouraged approach, let's rather remove RESPONSE.write.

With the switch from Medusa to WSGI, we already broke all applications that stream data and I suppose many have already switched to stream iterators.

Another problem with RESPONSE.write is that it's easy to use it for something else that streaming data, and that's a bad idea. I mean it allows to do strange things like write() and return data, and by the way what to output first ? If write() does not stream anymore, better reply properly if any exception happens after any call to write(), i.e. ignore all calls to write() (currently, we can have the strange behaviour that some written data is followed by a truncated error page). To sum up, RESPONSE.write is useless and error-prone.

jmuchemb avatar Nov 05 '19 15:11 jmuchemb

Julien Muchembled wrote at 2019-11-5 08:12 -0800:

... Then if there's no discouraged approach, let's rather remove RESPONSE.write.

I still (current master) see a use of Response.write in OFS.Image.File (line 388 and others). It is used there to decouple the ZODB use from the delivery of the file content -- such that downloading a huge file does not need a ZODB connection during the whole download time.

In my view, we should think about removing Response.write only after at least this use case has another good solution.

d-maurer avatar Nov 05 '19 18:11 d-maurer

We are talking about the fact that your case is already broken: with WSGI, it makes no difference between:

  • using Response.write
  • make your OFS.Image.File code instanciate a io.BytesIO() object, fill it and return .getvalue()

BTW, I am surprised by your example because one usually prefer the contrary: stream data directly from the DB to the user agent, without reading everything first into RAM or disk (someone pausing the download of a big file should cost nothing besides a few file descriptors). For this use case, there's nothing anymore: Response.write is broken and existing iterators are processed after the DB connection is closed. That's actually the first thing on which we plan to work (a discussion was started at https://lab.nexedi.com/nexedi/erp5/merge_requests/883#note_84239).

In my view, we should think about removing Response.write only after at least this use case has another good solution.

Sure. It's premature to clean up code if there's anyway no other good solution. But my comment is rather to announce the direction I'd like we choose.

jmuchemb avatar Nov 05 '19 19:11 jmuchemb

Julien Muchembled wrote at 2019-11-5 11:32 -0800:

We are talking about the fact that your case is already broken: with WSGI, it makes no difference between:

  • using Response.write
  • make your OFS.Image.File code instanciate a io.BytesIO() object, fill it and return .getvalue()

Then, likely, the WSGI interface should improve!

BTW, I am surprised by your example because one usually prefer the contrary: stream data directly from the DB to the user agent, without reading everything first into RAM or disk (someone pausing the download of a big file should cost nothing besides a few file descriptors).

Of course, this requires a database connection -- during the whole delivery of the file content. The required time depends on the bandwidth which is not under the control of the server. With Zope the database connection management is implicit: as long as a database connection is open, a worker thread is blocked. It is therefore vital to make the time the connection needs to be open independent on things not under server control.

The ZServer interface was aware of these issues and supported them (by transparently buffering, if necessary) The WSGI interface should do so as well. In my opinion, Response.write is an adequate interface to do this.

d-maurer avatar Nov 05 '19 22:11 d-maurer

It looks like the future is ASGI and if one solution for this current issue brings us closer to ASGI, we should choose it. I looked at the WSGI compatibility of https://github.com/encode/uvicorn and their start_response does not return a write callable, probably because PEP3333 deprecates this way. Changing Uvicorn to return write is trivial but I wouldn't be surprised if they refuse such patch.

IOW, there are WSGI implementations that simply don't implement the write API and if we don't have good arguments to convince all of them to implement it, better give up now.

The ZServer interface was aware of these issues and supported them (by transparently buffering, if necessary)

I didn't know about the transparent buffering for content bigger than 128k (each chunk bigger than 200 bytes goes via a temporary file). That's actually something we don't want, so it should not be hardcoded. By trying to avoid reaching the maximum number of ZODB connections, you get a lot of IO and you could fill the disk. That can also cause useless workload if the client decides after all to cancel its request.

I don't see the problem with increasing the maximum number of ZODB connections and workers (currently threads but not necessarily in the future).

jmuchemb avatar Nov 14 '19 15:11 jmuchemb

I've just looked at the code of waitress (that's what we chose for WSGI) and it does the same kind of transparent buffering as ZServer. What's strange is that it also buffers to file for unbound streams. All this looks bad for me :(

jmuchemb avatar Nov 14 '19 17:11 jmuchemb

Julien Muchembled wrote at 2019-11-14 07:32 -0800:

...

The ZServer interface was aware of these issues and supported them (by transparently buffering, if necessary)

I didn't know about the transparent buffering for content bigger than 128k (each chunk bigger than 200 bytes goes via a temporary file). That's actually something we don't want, so it should not be hardcoded.

The buffering is an elegant way to decouple request processing and response delivery. Decoupling is a good thing as response delivery can take arbitrarily long (depending on actions on the client and the available bandwidth).

By trying to avoid reaching the maximum number of ZODB connections, you get a lot of IO and you could fill the disk.

Otherwise, you use an large amount of ZODB connections (limited by the availability of file descriptors) and those are much rarer than disk space.

If you do not decouple response delivery and request processing, I can easily design a DOS attack: perform a large number of requests and refuse to read the responses. Thus, you must have some form of decoupling - disk buffering is one of the easiest ways.

d-maurer avatar Nov 14 '19 20:11 d-maurer

Otherwise, you use an large amount of ZODB connections (limited by the availability of file descriptors)

I was wrong in a previous comment. A ZODB Connection (or a thread) does not use any file descriptor (what may use file descriptors is the ZODB Storage from which Connections get data, and the number of file descriptors does not depend on the number of Connections). You may argue that it's specific to ZODB and other DB like MariaDB would use 1 socket per connection.

On the other side, current implementations of transparent buffering use 1 fd per request (for the temporary file). Sure, this could be improved with something more complex like a single sqlite3 DB buffering data for all requests, and even freeing data as soon as it's sent to the client.

I can easily design a DOS attack: perform a large number of requests and refuse to read the responses.

Currently, it looks like the decoupling was not implemented for this reason: at least the fd argument is in favor of my position.

Anyway, that's not where we can protect from a DoS attack. By not decoupling, it is indeed trivial to reach the maximum number of requests if one finds a request that returns too much data. With decoupling, 1 request finishes from time to time, giving some client a chance to get a slot, but frankly, the attacker can simply continue to open requests. I doubt decoupling helps significantly here (I mean the site is down for most users), whereas the cost is high: disk filled, wasted IO/CPU to fill the buffer.

jmuchemb avatar Nov 14 '19 23:11 jmuchemb

Julien Muchembled wrote at 2019-11-14 15:51 -0800:

Otherwise, you use an large amount of ZODB connections (limited by the availability of file descriptors)

I was wrong in a previous comment. A ZODB Connection (or a thread) does not use any file descriptor

I am aware of this. My remark "limited by the availability of file descriptors" refers to my DOS szenario -- create lots of requests and refuse to read the responses -- each such request uses a connection to the server and thus a file descriptor.

... Currently, it looks like the decoupling was not implemented for this reason: at least the fd argument is in favor of my position.

OFS.Image.File uses Response.write in order to decouple the request processing (bound to one of very few worker threads) from the response delivery (with its timing completely outside the control of the server).

When I looked at the ZServer code, I got the impression that many of its design decisions have been influenced by the uncontrollable delivery time; among them the disk buffering for large responses. Think about it: if the server produces more data than the client is ready to consume, something must be done with the excess. Something like ZServer can either drop it (potentially punishing well behaving clients in unfavorable situations (small bandwidth)) or keep it. Apparently, ZServer decided to keep it - and to use disk space as a relatively cheap resource for this.

I am aware that you would like e.g. in the case of OFS.Image.File to use a stream iterator in order to interleave reading from the ZODB and delivering partial data (and thus avoiding large intermediate data). This would require a new ZODB connection (not associated with the current request) and non trivial transaction handling (to ensure that the new ZODB connection sees potential modifications by the current request). It can be done but it is considerably more difficult than the use of Response.write.

Anyway, that's not where we can protect from a DoS attack. By not decoupling, it is indeed trivial to reach the maximum number of requests if one finds a request that returns too much data. With decoupling, 1 request finishes from time to time, giving some client a chance to get a slot, but frankly, the attacker can simply continue to open requests. I doubt decoupling helps significantly here (I mean the site is down for most users), whereas the cost is high: disk filled, wasted IO/CPU to fill the buffer.

The decoupling was likely not done to prevent DOS attacks but to ensure satisfactory behaviour despite delivery odds (like low bandwith, temporary network problems, etc.) at low costs (use disk space rather than RAM).

d-maurer avatar Nov 15 '19 09:11 d-maurer

A ZODB Connection (or a thread) does not use any file descriptor (what may use file descriptors is the ZODB Storage from which Connections get data, and the number of file descriptors does not depend on the number of Connections)

That depends on the storage implementation. A storage that natively implements IMVCCStorage has a new instance created for each Connection. Those new instances may require new sockets. RelStorage is an IMVCCStorage and may require up to two sockets per ZODB Connection.

jamadden avatar Nov 15 '19 12:11 jamadden

Oops, I forgot RelStorage.

bound to one of very few worker threads

Because of the GIL and the connection cache, one may indeed limit the number of threads within a process that serves heterogeneous requests. Sometimes, it's possible to specialize processes using a balancer, and then you can configure as many threads as you want, and small connection cache to avoid wasting RAM.

(With everything that has been said previously, I assume FDs aren't an issue: about them, there's already one for the socket to the client so an extra for the DB is in the same order of magnitude, current buffering implementations are bad, and the limit of FDs can be increased.)

I am aware that you would like e.g. in the case of OFS.Image.File to use a stream iterator in order to interleave reading from the ZODB and delivering partial data (and thus avoiding large intermediate data). This would require a new ZODB connection (not associated with the current request) and non trivial transaction handling (to ensure that the new ZODB connection sees potential modifications by the current request). It can be done but it is considerably more difficult than the use of Response.write.

This does not require a new ZODB connection. This can be done with the same connection as the one used for the request, by postponing the closing of the connection after the iteration is over. This is trivial with the deprecated WSGI write API. With the recommended WSGI API, it is a bit tricky, as you can see at https://lab.nexedi.com/nexedi/erp5/commit/4ca4d7f65393d2caf08abfa774ef9a52b5497f63: the iterator is wrapped within a auto_close_app_iter function that closes the ZODB in a finally clause, plus something else to switch to this way of closing.

You also seem to mix up the use of write and buffering. These are 2 independent things (modulo what I wrote in the previous paragraph). Removing Response.write does not prevent anything: it remains possible to buffer or not.

I didn't know that ZServer and waitress already do buffering, which means that buffering is off-topic (if I want to get this feature removed, I'll discuss elsewhere) and WSGIPublisher does redundant buffering.

So the way I'd like this issue to be resolved is, at least:

  • removing Response.write
  • removing the use of BytesIO from WSGIPublisher.py

jmuchemb avatar Nov 18 '19 19:11 jmuchemb

Julien Muchembled wrote at 2019-11-18 11:04 -0800:

...

I am aware that you would like e.g. in the case of OFS.Image.File to use a stream iterator in order to interleave reading from the ZODB and delivering partial data (and thus avoiding large intermediate data). This would require a new ZODB connection (not associated with the current request) and non trivial transaction handling (to ensure that the new ZODB connection sees potential modifications by the current request). It can be done but it is considerably more difficult than the use of Response.write.

This does not require a new ZODB connection. This can be done with the same connection as the one used for the request, by postponing the closing of the connection after the iteration is over.

Currently, there is an automatism which couples the request, the ZODB connection and the worker thread. If you drop "Response.write" in preference to keeping the ZODB connection longer open you must invent something which allows you to control this automatism and I have my doubts that this will be more natural.

... So the way I'd like this issue to be resolved is, at least:

  • removing Response.write
  • removing the use of BytesIO from WSGIPublisher.py

Apparently, someone needed a streaming facility and found that Response.write does not do this under WSGI (unlike ZServer). The most natural way would be make Response.write stream under WSGI as well and not to drop Response.write altogether.

Any solution you come up with must at least well accomodate the use case in OFS.Image.File. Show us such a complete solution and maybe it will be convincing...

d-maurer avatar Nov 19 '19 00:11 d-maurer

BTW, @d-maurer which WSGI implementations do you use ?

jmuchemb avatar Nov 19 '19 15:11 jmuchemb

Julien Muchembled wrote at 2019-11-19 07:37 -0800:

BTW, @d-maurer which WSGI implementations do you use ?

The default one (i.e. waitress).

d-maurer avatar Nov 19 '19 17:11 d-maurer

This discussion focuses on the use case of OFS.Image.File, delivering data from the ZODB. But that might not be the only use case. We use a (not version-controlled) file repository on the hard disk with methods that allow reading and delivering files from this very specific directory. This might have the same problems, but if delivering gigabytes of data to a client, Response.write is the only option, isn't it?

viktordick avatar Jan 07 '20 05:01 viktordick

FYI: We added an item to our roadmap (at PerFact) that (possibly large) static files should no longer be served by Zope, relying on a lighttpd running in parallel for these instead (maybe with Zope handling the authentication). We just are under the impression that this use case is not one of Zope's strong points, in part due to the issue discussed here.

viktordick avatar Jun 10 '23 11:06 viktordick