uwsgi nginx + uwsgi + django: Random upstream prematurely closed connection

Hi,

I've deployed my app with nginx + uwsgi.

It seems to be working fine except for the pact that every now and then, request will yield a 502 error code.

When this happens, I get a line like this in nginx/error.log:

2018/06/14 10:55:03 [error] 4674#4674: *18888 upstream prematurely closed connection while reading response header from upstream, client: 192.168.100.1, server: easyset.eu, request: "GET /main_menu_icons/paid_plans.svg HTTP/1.1", upstream: "uwsgi://unix:///tmp/mars.sock:", host: "www.easyset.eu", referrer: "https://www.easyset.eu/" It does not matter what request is made - the error is completely random. On all types of requests.

The relevant section of nginx site.conf

upstream main {
    server unix:///tmp/mars.sock; # for a file socket
#    server 127.0.0.1:8001; # for a web port socket (we'll use this first)
}

    location / {
        include uwsgi_params;
        uwsgi_pass main;
        uwsgi_buffers 256 16k;
    }

and uwsgi configuration:

master = true
#more processes, more computing power
processes = 32
threads = 3
post-buffering = 1
# this one must match net.core.somaxconn
listen = 8000
uid = mars
gid = mars
# add-gid = mars

# http-socket = 0.0.0.0:8080
socket = /tmp/mars.sock
stats = /tmp/mars_stats.sock
chown-socket = mars:mars
chmod-socket = 666
vacuum = true
socket-timeout = 300
harakiri = 300
max-worker-lifetime = 35

I've been researching this issue for more than a month now, but none of the remedies suggested seems to be working for me. I also did not detect the randomness factor in the other reported issues. Usually there was some misconfiguration involved, but not in my case: exactly the same command will work 90% of the time and fail the other 10% of the time.

It's quite annoying as pages only load partially and one can never be sure what has loaded and what hasn't.

I'd be grateful for any suggestions on what I'm missing here. Jure

Jun 14 '18 09:06 velis74

Do you guys offer paid support? This is killing me and i'm not afraid of paying for a solution.

Jun 26 '18 10:06 velis74

Try increasing max-worker-lifetime. 35 seconds is very low.

Aug 01 '18 11:08 sbalmer

It is set to 35 seconds only because otherwise there was even more of 502. A busy worker will not be restarted before it completes its current job, so I've seen no ill effects of lowering this setting (except for a bit of CPU usage).

Aug 01 '18 11:08 velis74

Hi there,

Did you find a solution by chance? I have been struggling the same way you did for days now, it is truly driving me nuts. For the same page, sometimes it displays properly, sometimes not, I really hate this kind of random behavior! I logued my Django app but nothing, and the error in nginx/error.log is not explicit at all. The lifetime parameters is (of what I read on the internet) meant when you try to upload heavy files. But in my case, I am just talking about displaying light pages!

Many thanks for your answer!

Aug 16 '18 16:08 KrazyMax

Well, actually I was not. As a fallback I'm currently also running an instance of Apache with mod-wsgi. Considering just moving the whole thing back to it. Had no issues like this with that combination.

Aug 16 '18 18:08 velis74

Damn it. I wanted to move to gunicorn because with Apache2+mod_wsgi, I have the same behavior (always on the same group of page I realize now), and reading apache2 log just gives Truncated or oversized response headers received from daemon process with a wonderful Segmentation fault (11). There must be something else, not related to apache2 nor gnix, but I can't find what. Exhausting 😩

Aug 16 '18 19:08 KrazyMax

Thank you for your answer by the way!

Aug 16 '18 19:08 KrazyMax

@KrazyMax you are using threads, are you sure your app is thread-safe ? Try disabling them

Aug 16 '18 19:08 rdeioris

Hi @rdeioris! Well I am not an Apache2/Gnix expert at all, and I also did not ask myself questions regarding threading actually. I did not define threads in particular.

I changed logging in Apache2 from warnto info to get further details, and I've got this:

[Thu Aug 16 15:23:56.868288 2018] [wsgi:info] [pid 21505:tid 139948258772864] mod_wsgi (pid=21505): Starting process 'mywebsite.fr' with uid=33, gid=33 and threads=15.

So I guess it means I multi-thread with 15 threads. Sorry to look like a noob, but that's what I am regarding server confs.

I changed in mpm_event.conf those 2 parameters: ThreadsPerChild to 1 (default 25) and MaxSpareThreads to 1 (default is 4). Unfortunately, even after a reload and a stop/start of apache2, same issue.

Something interesting I noticed on one page which sometimes display correctly, sometimes throws a server error: when I have an error, almost everytime I do a force refresh in my web browser, it displays correctly!

Aug 16 '18 19:08 KrazyMax

Final comment @velis74: I identified the issue. I wanted to be able to use f-string. As this feature is only available on Python 3.6+, I did install Python 3.7, without thinking at all the issue could come from a stable release of Python. So, I decided to delete anything related to f-strings in my own code, then to go back to my Python 3.5, just to see. And that was it. I followed every tutorials I found on the internet, from Python website to mod_wsgi, installing Python 3.7 by sharing it as recommanded and so on, but I had no error message, no clue that Python would be at fault.

So maybe have a look on how your installed Python on your server, because now the issue for Apache2 as for Nginx is solved!

I'm mad at me, because I was too confident on my Python 3.7 install which did not throw any error at install or when Apache2/Nginx started to run.

Aug 17 '18 07:08 KrazyMax

@KrazyMax So you're saying that using an unsupported feature (f-strings) with python 3.5 resulted in RANDOM 502s on your server? Wouldn't that, like, fail EVERY time? How does one debug something like that????

Aug 18 '18 04:08 velis74

Error seems to happen after many seconds. From https://monicalent.com/blog/2013/12/06/set-up-nginx-and-uwsgi/ I have found out that "limit-as" uwsgi option should be increased to get memory allocations pass.

Dec 01 '18 17:12 ymartin59

I too got the same error upstream prematurely closed connection while reading response header from upstream , but for nginx-uwsgi-flask docker image. In my case, the issue was because in the Dockerfile, I had installed uwsgi using apt-get. So, to fix this I had 2 options.

Update Dockerfile, to install a python plugin uwsgi-plugin-python through apt-get and add plugins = python to the uwsgi app config. Reference
Update the Dockerfile, to install uwsgi using pip instead of apt-get.

Dec 20 '18 13:12 bspai

I will try this

Dec 20 '18 21:12 velis74

i get the same problem ,do you resolve it ? @velis74

Jan 14 '19 08:01 hanhuizhu

nope

Jan 14 '19 08:01 velis74

@velis74 you should probably get some hints in uwsgi log. I'll give it a try removing selectively the following configurations:

post-buffering = 1 harakiri = 300 max-worker-lifetime = 35

Jan 14 '19 09:01 xrmx

i have solve this problem, by chang uwsgi protocal to http protocal.

uwsgi.ini

[uwsgi] http=127.0.0.1:7020 chdir=/opt/www/sqdashi/savemoney/src/main/python/com wsgi-file=savemoney/wsgi.py processes=16 threads=2 master=True pidfile=uwsgi.pid daemonize=/opt/www/sqdashi/savemoney/src/main/python/com/log/uwsgi-@(exec://date +%%Y-%%m-%%d).log

nginx.conf

proxy_pass http://127.0.0.1:7020;

test

i task one thousand requests and no happend error

Jan 14 '19 16:01 hanhuizhu

you can try this ,if you still want @velis74

Jan 14 '19 16:01 hanhuizhu

Thanks, @hanhuizhu I will definitely try this. I thought sockets were the most reliable and fastest solution, so it never occurred to me to switch to something else...

Jan 15 '19 06:01 velis74

@hanhuizhu , thank you! I was hitting on exactly the same issue and i was struggling to find a solution. Service worked randomly . Moving away from sockets did the trick for me as well... For the record, in case it helps someone else, my setup was dockerized and uwsgi was installed through pip.

Feb 12 '19 21:02 gtarnaras

This helped me a lot https://www.codesd.com/item/django-gunicorn-nginx-download-large-error-file-502.html and also check Gunicorn status: sudo systemctl status gunicorn In my case, there were no permissions for saving in directory 😆

May 13 '19 11:05 pawisoon

Probably related/duplicate to this : https://github.com/unbit/uwsgi/issues/1702

Aug 16 '19 12:08 olivierdalang

Removing --threads=4 seems to fix the issue here ?!

Aug 20 '19 17:08 olivierdalang

I have met a similar question，when uwsgi set multi threads and reach the reload_on_rss,uwsgi worker respawn.However the nginx get 502 error. uwsgi seems not to wait for the resp complete.

Nov 04 '19 09:11 guoyu7157

I have tried shifting my workload from gunicorn to uwsgi and immediately hit this issue. I have created a minimal example which allows everyone to reproduce this in seconds, all you need is Docker: https://github.com/joekohlsdorf/uwsgi-nginx-bug

If there is anything more we can contribute to fixing this bug I'll be happy to help. My biggest issue is that uwsgi is silent about these errors so I have no idea where to start looking.

Copying the README of my repo which has some additional info:

This repo is a minimal example to reliably reproduce a bug where uwsgi closes connections prematurely.

How To Run

You need Docker to run this demo.

Checkout the code
Run make up
Wait a few seconds and you'll see error messages upstream prematurely closed connection while reading response header from upstream from then nginx container appearing
You will also see the logstream of uwsgi which is clean.
Hit CTRL-C and everything will stop within in a couple of seconds.

What Does It Run

The container workload is an empty Django project. Django is served by uwsgi in worker mode running 2 processes.
The container nginx runs an nginx reverse proxy to uwsgi.
The container ab runs apachebench against the nginx container.

Frequently Asked Questions

Change setting XYZ of uwsgi.
- I sadly haven't found any setting which fixes this.
You are overflowing the listen backlog.
- uwsgi has a mechanism to alert when this happens, it isn't the cause of the problem.
This is a benchmark problem.
- Exactly the same happens with real traffic.
The problem is caused by Nginx.
- This does not happend with gunicorn.
How can I change the uwsgi config?
- Adjust uwsgi.ini and run again.
How can I change what is installed in the Docker images?
- Adjust the Dockerfile and run make rebuild.

Further illustration of the problem

In the below graph you can see two days of these errors in a production environment.

First day is uwsgi, second day is gunicorn. Similar traffic, same amount of processes. gunicorn errors are due to timeout (gunicorns harakiri), unexpected errors on gunicorn are 0.

This environment is running with very low load, about 30% worker usage.

errors

Jun 12 '20 04:06 joekohlsdorf

@joekohlsdorf have you tried enlarging the listen queue? gunicorn one is like 20x times the uwsgi default.

Jun 12 '20 10:06 xrmx

Mine is at 8000, that should be enough, I should think.

Jun 12 '20 11:06 velis74

There are so many reasons to have this kind of error rates, that this issue is becaming a source of random solutions :(

The (probably incomplete) list of things to check:

listen queue (the listen/-l option of uWSGI is not enough, you need to check the operating system limits too)
timeouts, the default socket timeout is low for this kind of benchmarks, you can raise it with --socket-timeout, check nginx timeouts too for the uwsgi protocol
non thread-safe app, lot of users do not realize that apps are not thread safe by default (most of them are totally non thread safe), just disable multithreading if this is the case.
non fork-safe app, your app could have issues with uWSGI preforking model, just disable it with --lazy-apps
buffer-size limit, this is logged but some of the requests could overflow the 4k buffer limit (could happen for requests with lot of headers). Tune it with --buffer-size
changing between --socket and http-socket is meaningful only if you are configuring nginx with different tuning values between uwsgi and http protocol
adding --http (not --http-socket, they are different!) is wrong, as you will end adding another proxy between nginx and uWSGI. It generally solves the issue because the uWSGI http router is built for being exposed to the public so is way more tolerant and with bigger defaults.
running external process, long story short: if you need to spawn processes during requests add --close-on-exec.

And please, please, stop creating huge backlog queues (they are ok for benchmarks, but then set them to meaningful values based on your QoS policies), they consume precious kernel memory.

Jun 12 '20 14:06 rdeioris

Until now I was under the impression that uwsgi would warn when the listen queue gets full, there are several related settings but I never got any warning, this is why I never changed this setting. So these errors might not be shown when running with --http which is problematic.

I raised the listen queue to 500 and my test passes without a single error!

Can we please agree that the default listen queue is too small? This is not a benchmark, I showed you 48 hours of real world errors comparing uwsgi to gunicorn on a large scale deployment across hundreds of instances running with very low average worker load (30% target for the test). This is not a random hiccup but a consistent problem.

Let's compare the default listen backlog of some other application servers to uwsgi:

uwsgi: 100
bjoern: 1024
gunicorn: 2048
meinheld: 2048
nginx: 511 on Linux
Apache: 511
lighttpd: 1024

The webservers are not very comparable because they generally don't have to wait for a worker to become available but for reference I still included them.

Test result with listen backlog 500 (0 errores) of my repo, as you can see from the response times this test is not at all trying to overflow the system:

ab_1        | Requests per second:    388.00 [#/sec] (mean)
ab_1        | Time per request:       515.457 [ms] (mean)
ab_1        | Time per request:       2.577 [ms] (mean, across all concurrent requests)
ab_1        | Transfer rate:          6260.37 [Kbytes/sec] received
ab_1        | 
ab_1        | Connection Times (ms)
ab_1        |               min  mean[+/-sd] median   max
ab_1        | Connect:        0    0   0.9      0      54
ab_1        | Processing:    13  515  54.1    505     885
ab_1        | Waiting:       13  514  53.8    504     881
ab_1        | Total:         35  515  53.9    505     885
ab_1        | 
ab_1        | Percentage of the requests served within a certain time (ms)
ab_1        |   50%    505
ab_1        |   66%    521
ab_1        |   75%    538
ab_1        |   80%    551
ab_1        |   90%    581
ab_1        |   95%    607
ab_1        |   98%    650
ab_1        |   99%    697
ab_1        |  100%    885 (longest request)

Jun 12 '20 16:06 joekohlsdorf

uwsgi uwsgi copied to clipboard

nginx + uwsgi + django: Random upstream prematurely closed connection

uwsgi.ini

nginx.conf

test

How To Run

What Does It Run

Frequently Asked Questions

Further illustration of the problem

uwsgi
uwsgi copied to clipboard