python-web-perf
python-web-perf copied to clipboard
More coroutines
My attempt of narrowing down on cases where async is a clear better choice than Flask. Please see notes.md
I'm am not able to replicate your results and I suspect the reason why is that you have the number of workers held too low for the gunicorn to operate effectively. Here's my reasoning first, them my results:
When you run an asyncio worker (or any async based worker) they get the implicit freedom to use their green threads as much as they like and most of the time an async worker will saturate a single core with their worker which is probably what you're seeing locally - two maxed out CPU cores because you've set PWPWORKERS to 2.
When you run gunicorn, because it uses operating system processes instead of green threads, it only gets the parallelism that you grant it via the number of workers you've launched. If you only run two workers, it only gets two threads of execution - whenever it does IO in one of those threads, that thread will block. The worker count needs to be high enough to saturate the cpu resources when under load. How high this is depends on the app but a reasonable number here seems to be 3-4 workers per CPU core. (But without setting cpu affinity or something, it won't be possible to do a good test that uses only some CPU cores!)
Consequently you have limited the parallelism for gunicorn in a way that you haven't for uvicorn - so it's not a fair test any more.
My data, from some local runs on my laptop, where I gave gunicorn and UWSGI enough workers to saturate the CPU:
Server | p50 | p99 | Reqs/s |
---|---|---|---|
UWSGI+Bottle | 33 | 47 | 2961 |
Flask+Gunicorn | 50 | 65 | 1973 |
Sanic+Uvicorn | 23 | 190 | 2441 |
(not the proper test env, I edited the Bottle app myself to match)
These results are approximately in line with my full tests, which Sanic+Uvicorn does more requests a second than Flask but latency is much more variable. Bear in mind the Sanic is getting to use ujson and none of the others are!
Full run data
Two other things that look odd with your runs:
- You aren't putting nginx in front, which I suspect makes a difference
- For whatever reason, it seems to run an order of magnitude slower on your machine. eg univcorn+sanic's p50 latency is 269ms which seems suspicious when mine is 23m. I'm running on a discontinued midrange Intel CPU from 2014 so I would have thought your macintosh was quicker.
Also you corrected a port in async_db.py
which makes me suspicious that the async code didn't run behind pgbouncer in the test (can't remember if I fixed that in a local edit) so I'll re-run the async code in the full environment again to double check before I publish my post.
When you run an asyncio worker (or any async based worker) they get the implicit freedom to use their green threads as much as they like and most of the time an async worker will saturate a single core with their worker which is probably what you're seeing locally - two maxed out CPU cores because you've set PWPWORKERS to 2.
Very fair point. This ran on personal 2017 mac, which has 4-cores, AFAIK (the new one from Bulb has 6). I intentionally left the number of workers low to see the difference in a constrained environment. Even tested with 1, but got bored waiting.
Why would nginx matter? Since it's only a pass trough proxy, it's just an extra moving piece.
The port change in async_db.py
is because I've set the dockerized postgres instance on it, to not clash with the system one. If pgbounce was not kicking in, that was the case for both runs.
Agreed, the p50 vs p99 results are suprising on your machine, but I don't see them at all on my side.
Will run it again tonight with more workers and on the default postgres port.
Will also create a new route for running 3 queries, just to avoid comments.
I originally added nginx for realism but without it the sync workers seem to do about 20% better (asyncio remains the same) so it might make it harder to judge the situation. I plan to look into this before I publish but for various reasons most people don't run without a reverse proxy these days so I wanted to include one.
I noted the port change because i thought it might mean that I've been running the benchmark without pgbouncer for asyncio workers but actually it doesn't seem to make a difference. I tend to use 5432 for real postgres and 6432 for pgbouncer. All of the apps also have connection pooling too and I'm not sure off hand how much this contributes.
Some of the difference could be down to macos vs linux process scheduling.