gunicorn
gunicorn copied to clipboard
Slow processing when receiving large data via POST
Hello,
Being a newbie to GUnicorn, I posted this question on Stackoverflow first, thinking I was getting something wrong, but it looks like it's indeed a problem with GUnicorn and not just an error on my side. I'll describe the problem below, more details and full code to reproduce on the Stackoverflow question here: https://stackoverflow.com/questions/67938278/waitress-and-gunicorn-large-data-input-is-much-slower-than-flask-development-ser/68079761
In summary, if a lot of data is sent to GUnicorn via a POST request, it takes a large time to process (compared to e.g. the Flask development server). A helpful Stackoverflow user has found the blocking point to be the following lines:
while size > self.buf.tell():
data = self.reader.read(1024)
if not data:
break
self.buf.write(data)
inside gunicorn/html/body.py
. I confirmed it, and found that if GUnicorn was receiving 30MB of data (on localhost, so very little I/O overhead), then this snippet alone was taking 0.2s on my machine.
Smelling a slow python while loop (30MB means the loop is done ~30k times) I tested with increasing the amount of data read per step replacing 1024
by 1024**2=1048576
. This helped in my case, reducing time from ~200ms to ~20ms. But I don't know if it's a viable solution (some tests seemed to show that it slowed things down in case the received data was very small, but further tests didn't confirm this, so I'm not sure anymore). Also I haven't checked if the value is optimal (I only noticed that if I further increase to 2**25 = 33554432
it gets slower, about 40ms, so it's not true that more is better or even that hitting the exact amount of data in one go is ideal)
Would you think increasing that amount of data read per step would be feasible? Or would there be a cleaner solution to this problem?
I had the same issue. The performance of GUNICORN suffers a lot compared to Flask indeed.
I will do a PR to introduce an environment variable to modify this value. I suggest the name BUF_READ_SIZE for the environment variable
@bashirmindee do you still plan to work on this PR?
I just forked the Repo, and modified this piece of code:
BUF_READ_SIZE = os.environ.get("BUF_READ_SIZE", 1024)
while size > self.buf.tell():
data = self.reader.read(BUF_READ_SIZE)
if not data:
break
self.buf.write(data)
And by setting the env variable BUF_READ_SIZE
to 1024*1024, I got the performance benefit for big payloads
I just forked the Repo, and modified this piece of code:
BUF_READ_SIZE = os.environ.get("BUF_READ_SIZE", 1024) while size > self.buf.tell(): data = self.reader.read(BUF_READ_SIZE) if not data: break self.buf.write(data)
And by setting the env variable
BUF_READ_SIZE
to 1024*1024, I got the performance benefit for big payloads
thank you ... Imo thie could be a setting also.
I'm facing same issue in production when we have plan to merge this PR.if any help required from my side i am ready to contribute thank you