uwsgi
uwsgi copied to clipboard
nginx + uwsgi + django: Random upstream prematurely closed connection
Hi,
I've deployed my app with nginx + uwsgi.
It seems to be working fine except for the pact that every now and then, request will yield a 502 error code.
When this happens, I get a line like this in nginx/error.log:
2018/06/14 10:55:03 [error] 4674#4674: *18888 upstream prematurely closed connection while reading response header from upstream, client: 192.168.100.1, server: easyset.eu, request: "GET /main_menu_icons/paid_plans.svg HTTP/1.1", upstream: "uwsgi://unix:///tmp/mars.sock:", host: "www.easyset.eu", referrer: "https://www.easyset.eu/"
It does not matter what request is made - the error is completely random. On all types of requests.
The relevant section of nginx site.conf
upstream main {
server unix:///tmp/mars.sock; # for a file socket
# server 127.0.0.1:8001; # for a web port socket (we'll use this first)
}
location / {
include uwsgi_params;
uwsgi_pass main;
uwsgi_buffers 256 16k;
}
and uwsgi configuration:
master = true
#more processes, more computing power
processes = 32
threads = 3
post-buffering = 1
# this one must match net.core.somaxconn
listen = 8000
uid = mars
gid = mars
# add-gid = mars
# http-socket = 0.0.0.0:8080
socket = /tmp/mars.sock
stats = /tmp/mars_stats.sock
chown-socket = mars:mars
chmod-socket = 666
vacuum = true
socket-timeout = 300
harakiri = 300
max-worker-lifetime = 35
I've been researching this issue for more than a month now, but none of the remedies suggested seems to be working for me. I also did not detect the randomness factor in the other reported issues. Usually there was some misconfiguration involved, but not in my case: exactly the same command will work 90% of the time and fail the other 10% of the time.
It's quite annoying as pages only load partially and one can never be sure what has loaded and what hasn't.
I'd be grateful for any suggestions on what I'm missing here. Jure
Do you guys offer paid support? This is killing me and i'm not afraid of paying for a solution.
Try increasing max-worker-lifetime
. 35 seconds is very low.
It is set to 35 seconds only because otherwise there was even more of 502. A busy worker will not be restarted before it completes its current job, so I've seen no ill effects of lowering this setting (except for a bit of CPU usage).
Hi there,
Did you find a solution by chance? I have been struggling the same way you did for days now, it is truly driving me nuts. For the same page, sometimes it displays properly, sometimes not, I really hate this kind of random behavior! I logued my Django app but nothing, and the error in nginx/error.log is not explicit at all. The lifetime parameters is (of what I read on the internet) meant when you try to upload heavy files. But in my case, I am just talking about displaying light pages!
Many thanks for your answer!
Well, actually I was not. As a fallback I'm currently also running an instance of Apache with mod-wsgi. Considering just moving the whole thing back to it. Had no issues like this with that combination.
Damn it. I wanted to move to gunicorn because with Apache2+mod_wsgi, I have the same behavior (always on the same group of page I realize now), and reading apache2 log just gives Truncated or oversized response headers received from daemon process
with a wonderful Segmentation fault (11)
. There must be something else, not related to apache2 nor gnix, but I can't find what. Exhausting 😩
Thank you for your answer by the way!
@KrazyMax you are using threads, are you sure your app is thread-safe ? Try disabling them
Hi @rdeioris! Well I am not an Apache2/Gnix expert at all, and I also did not ask myself questions regarding threading actually. I did not define threads in particular.
I changed logging in Apache2 from warn
to info
to get further details, and I've got this:
[Thu Aug 16 15:23:56.868288 2018] [wsgi:info] [pid 21505:tid 139948258772864] mod_wsgi (pid=21505): Starting process 'mywebsite.fr' with uid=33, gid=33 and threads=15.
So I guess it means I multi-thread with 15 threads. Sorry to look like a noob, but that's what I am regarding server confs.
I changed in mpm_event.conf
those 2 parameters: ThreadsPerChild
to 1 (default 25) and MaxSpareThreads
to 1 (default is 4). Unfortunately, even after a reload and a stop/start of apache2, same issue.
Something interesting I noticed on one page which sometimes display correctly, sometimes throws a server error: when I have an error, almost everytime I do a force refresh in my web browser, it displays correctly!
Final comment @velis74: I identified the issue. I wanted to be able to use f-string
. As this feature is only available on Python 3.6+
, I did install Python 3.7
, without thinking at all the issue could come from a stable release of Python. So, I decided to delete anything related to f-string
s in my own code, then to go back to my Python 3.5
, just to see. And that was it. I followed every tutorials I found on the internet, from Python
website to mod_wsgi
, installing Python 3.7
by sharing it as recommanded and so on, but I had no error message, no clue that Python
would be at fault.
So maybe have a look on how your installed Python
on your server, because now the issue for Apache2 as for Nginx is solved!
I'm mad at me, because I was too confident on my Python 3.7
install which did not throw any error at install or when Apache2/Nginx started to run.
@KrazyMax So you're saying that using an unsupported feature (f-strings) with python 3.5 resulted in RANDOM 502s on your server? Wouldn't that, like, fail EVERY time? How does one debug something like that????
Error seems to happen after many seconds. From https://monicalent.com/blog/2013/12/06/set-up-nginx-and-uwsgi/ I have found out that "limit-as" uwsgi option should be increased to get memory allocations pass.
I too got the same error upstream prematurely closed connection while reading response header from upstream
, but for nginx-uwsgi-flask docker image. In my case, the issue was because in the Dockerfile, I had installed uwsgi using apt-get. So, to fix this I had 2 options.
- Update Dockerfile, to install a python plugin uwsgi-plugin-python through apt-get and add
plugins = python
to the uwsgi app config. Reference - Update the Dockerfile, to install uwsgi using pip instead of apt-get.
I will try this
i get the same problem ,do you resolve it ? @velis74
nope
@velis74 you should probably get some hints in uwsgi log. I'll give it a try removing selectively the following configurations:
post-buffering = 1 harakiri = 300 max-worker-lifetime = 35
i have solve this problem, by chang uwsgi protocal to http protocal.
uwsgi.ini
[uwsgi] http=127.0.0.1:7020 chdir=/opt/www/sqdashi/savemoney/src/main/python/com wsgi-file=savemoney/wsgi.py processes=16 threads=2 master=True pidfile=uwsgi.pid daemonize=/opt/www/sqdashi/savemoney/src/main/python/com/log/uwsgi-@(exec://date +%%Y-%%m-%%d).log
nginx.conf
proxy_pass http://127.0.0.1:7020;
test
i task one thousand requests and no happend error
you can try this ,if you still want @velis74
Thanks, @hanhuizhu I will definitely try this. I thought sockets were the most reliable and fastest solution, so it never occurred to me to switch to something else...
@hanhuizhu , thank you! I was hitting on exactly the same issue and i was struggling to find a solution. Service worked randomly . Moving away from sockets did the trick for me as well... For the record, in case it helps someone else, my setup was dockerized and uwsgi was installed through pip.
This helped me a lot https://www.codesd.com/item/django-gunicorn-nginx-download-large-error-file-502.html and also check Gunicorn status: sudo systemctl status gunicorn In my case, there were no permissions for saving in directory 😆
Probably related/duplicate to this : https://github.com/unbit/uwsgi/issues/1702
Removing --threads=4
seems to fix the issue here ?!
I have met a similar question,when uwsgi set multi threads and reach the reload_on_rss,uwsgi worker respawn.However the nginx get 502 error. uwsgi seems not to wait for the resp complete.
I have tried shifting my workload from gunicorn to uwsgi and immediately hit this issue. I have created a minimal example which allows everyone to reproduce this in seconds, all you need is Docker: https://github.com/joekohlsdorf/uwsgi-nginx-bug
If there is anything more we can contribute to fixing this bug I'll be happy to help. My biggest issue is that uwsgi is silent about these errors so I have no idea where to start looking.
Copying the README of my repo which has some additional info:
This repo is a minimal example to reliably reproduce a bug where uwsgi closes connections prematurely.
How To Run
You need Docker to run this demo.
- Checkout the code
- Run
make up
- Wait a few seconds and you'll see error messages
upstream prematurely closed connection while reading response header from upstream
from thennginx
container appearing - You will also see the logstream of
uwsgi
which is clean. - Hit CTRL-C and everything will stop within in a couple of seconds.
What Does It Run
- The container
workload
is an empty Django project. Django is served by uwsgi in worker mode running 2 processes. - The container
nginx
runs an nginx reverse proxy to uwsgi. - The container
ab
runs apachebench against thenginx
container.
Frequently Asked Questions
-
Change setting XYZ of uwsgi.
- I sadly haven't found any setting which fixes this.
-
You are overflowing the listen backlog.
- uwsgi has a mechanism to alert when this happens, it isn't the cause of the problem.
-
This is a benchmark problem.
- Exactly the same happens with real traffic.
-
The problem is caused by Nginx.
- This does not happend with gunicorn.
-
How can I change the uwsgi config?
- Adjust
uwsgi.ini
and run again.
- Adjust
-
How can I change what is installed in the Docker images?
- Adjust the
Dockerfile
and runmake rebuild
.
- Adjust the
Further illustration of the problem
In the below graph you can see two days of these errors in a production environment.
First day is uwsgi, second day is gunicorn. Similar traffic, same amount of processes. gunicorn errors are due to timeout (gunicorns harakiri), unexpected errors on gunicorn are 0.
This environment is running with very low load, about 30% worker usage.
@joekohlsdorf have you tried enlarging the listen queue? gunicorn one is like 20x times the uwsgi default.
Mine is at 8000, that should be enough, I should think.
There are so many reasons to have this kind of error rates, that this issue is becaming a source of random solutions :(
The (probably incomplete) list of things to check:
- listen queue (the listen/-l option of uWSGI is not enough, you need to check the operating system limits too)
- timeouts, the default socket timeout is low for this kind of benchmarks, you can raise it with --socket-timeout, check nginx timeouts too for the uwsgi protocol
- non thread-safe app, lot of users do not realize that apps are not thread safe by default (most of them are totally non thread safe), just disable multithreading if this is the case.
- non fork-safe app, your app could have issues with uWSGI preforking model, just disable it with --lazy-apps
- buffer-size limit, this is logged but some of the requests could overflow the 4k buffer limit (could happen for requests with lot of headers). Tune it with --buffer-size
- changing between --socket and http-socket is meaningful only if you are configuring nginx with different tuning values between uwsgi and http protocol
- adding --http (not --http-socket, they are different!) is wrong, as you will end adding another proxy between nginx and uWSGI. It generally solves the issue because the uWSGI http router is built for being exposed to the public so is way more tolerant and with bigger defaults.
- running external process, long story short: if you need to spawn processes during requests add --close-on-exec.
And please, please, stop creating huge backlog queues (they are ok for benchmarks, but then set them to meaningful values based on your QoS policies), they consume precious kernel memory.
Until now I was under the impression that uwsgi would warn when the listen queue gets full, there are several related settings but I never got any warning, this is why I never changed this setting. So these errors might not be shown when running with --http
which is problematic.
I raised the listen queue to 500 and my test passes without a single error!
Can we please agree that the default listen queue is too small? This is not a benchmark, I showed you 48 hours of real world errors comparing uwsgi to gunicorn on a large scale deployment across hundreds of instances running with very low average worker load (30% target for the test). This is not a random hiccup but a consistent problem.
Let's compare the default listen backlog of some other application servers to uwsgi:
- uwsgi: 100
- bjoern: 1024
- gunicorn: 2048
- meinheld: 2048
- nginx: 511 on Linux
- Apache: 511
- lighttpd: 1024
The webservers are not very comparable because they generally don't have to wait for a worker to become available but for reference I still included them.
Test result with listen backlog 500 (0 errores) of my repo, as you can see from the response times this test is not at all trying to overflow the system:
ab_1 | Requests per second: 388.00 [#/sec] (mean)
ab_1 | Time per request: 515.457 [ms] (mean)
ab_1 | Time per request: 2.577 [ms] (mean, across all concurrent requests)
ab_1 | Transfer rate: 6260.37 [Kbytes/sec] received
ab_1 |
ab_1 | Connection Times (ms)
ab_1 | min mean[+/-sd] median max
ab_1 | Connect: 0 0 0.9 0 54
ab_1 | Processing: 13 515 54.1 505 885
ab_1 | Waiting: 13 514 53.8 504 881
ab_1 | Total: 35 515 53.9 505 885
ab_1 |
ab_1 | Percentage of the requests served within a certain time (ms)
ab_1 | 50% 505
ab_1 | 66% 521
ab_1 | 75% 538
ab_1 | 80% 551
ab_1 | 90% 581
ab_1 | 95% 607
ab_1 | 98% 650
ab_1 | 99% 697
ab_1 | 100% 885 (longest request)