gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Slow processing when receiving large data via POST

Open mspinaci opened this issue 3 years ago • 5 comments

Hello,

Being a newbie to GUnicorn, I posted this question on Stackoverflow first, thinking I was getting something wrong, but it looks like it's indeed a problem with GUnicorn and not just an error on my side. I'll describe the problem below, more details and full code to reproduce on the Stackoverflow question here: https://stackoverflow.com/questions/67938278/waitress-and-gunicorn-large-data-input-is-much-slower-than-flask-development-ser/68079761

In summary, if a lot of data is sent to GUnicorn via a POST request, it takes a large time to process (compared to e.g. the Flask development server). A helpful Stackoverflow user has found the blocking point to be the following lines:

while size > self.buf.tell():
     data = self.reader.read(1024)
     if not data:
         break
     self.buf.write(data)

inside gunicorn/html/body.py. I confirmed it, and found that if GUnicorn was receiving 30MB of data (on localhost, so very little I/O overhead), then this snippet alone was taking 0.2s on my machine.

Smelling a slow python while loop (30MB means the loop is done ~30k times) I tested with increasing the amount of data read per step replacing 1024 by 1024**2=1048576. This helped in my case, reducing time from ~200ms to ~20ms. But I don't know if it's a viable solution (some tests seemed to show that it slowed things down in case the received data was very small, but further tests didn't confirm this, so I'm not sure anymore). Also I haven't checked if the value is optimal (I only noticed that if I further increase to 2**25 = 33554432 it gets slower, about 40ms, so it's not true that more is better or even that hitting the exact amount of data in one go is ideal)

Would you think increasing that amount of data read per step would be feasible? Or would there be a cleaner solution to this problem?

mspinaci avatar Jun 22 '21 15:06 mspinaci

I had the same issue. The performance of GUNICORN suffers a lot compared to Flask indeed.

I will do a PR to introduce an environment variable to modify this value. I suggest the name BUF_READ_SIZE for the environment variable

bashirmindee avatar Jan 03 '22 13:01 bashirmindee

@bashirmindee do you still plan to work on this PR?

fjsj avatar Nov 03 '22 14:11 fjsj

I just forked the Repo, and modified this piece of code:

BUF_READ_SIZE = os.environ.get("BUF_READ_SIZE", 1024)
while size > self.buf.tell():
     data = self.reader.read(BUF_READ_SIZE)
     if not data:
         break
     self.buf.write(data)

And by setting the env variable BUF_READ_SIZE to 1024*1024, I got the performance benefit for big payloads

bashirmindee avatar Nov 14 '22 08:11 bashirmindee

I just forked the Repo, and modified this piece of code:

BUF_READ_SIZE = os.environ.get("BUF_READ_SIZE", 1024)
while size > self.buf.tell():
     data = self.reader.read(BUF_READ_SIZE)
     if not data:
         break
     self.buf.write(data)

And by setting the env variable BUF_READ_SIZE to 1024*1024, I got the performance benefit for big payloads

thank you ... Imo thie could be a setting also.

benoitc avatar May 07 '23 19:05 benoitc

I'm facing same issue in production when we have plan to merge this PR.if any help required from my side i am ready to contribute thank you

m-aamirmumtaz avatar Jan 12 '24 19:01 m-aamirmumtaz