uwsgi icon indicating copy to clipboard operation
uwsgi copied to clipboard

Unexplainable timeout error when using uwsgi behind nginx

Open bow opened this issue 6 years ago • 82 comments

Hi everyone,

I'm using uwsgi (version 2.0.13.1) behind nginx (version 1.10.3) to run a Flask application on Python 3 on Ubuntu 14.04.03. The connection to nginx is done via https, but nginx connects to uwsgi via an IPC socket (config below).

I've been getting some timeout errors from nginx that I can't explain. Here are some examples:

2017/08/17 17:38:02 [error] 22887#22887: *613 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <CLIENT_IP>, server: _, request: "POST <URL> HTTP/1.1
2017/08/31 10:39:22 [error] 5904#5904: *464 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <CLIENT_IP>, server: _, request: "POST <URL> HTTP/1.1

Here's my uwsgi config file:

[uwsgi]
module = wsgi:create_app()

master = true
processes = 5

socket = /tmp/app.sock
chown-socket = app_user:www-data
chmod-socket = 660
socket-timeout = 60
harakiri = 60
vacuum = true

enable-threads = true
die-on-term = true
single-interpreter = true
harakiri-verbose = true

logger = rsyslog:localhost:514,app-uwsgi,14
stats = 127.0.0.1:9090

And here's my uwsgi_params settings in nginx:

        include                         uwsgi_params;
        uwsgi_param                     X-Real-IP $remote_addr;
        uwsgi_param                     X-Forwarded-For $proxy_add_x_forwarded_for;
        uwsgi_param                     X-Forwarded-Proto $http_x_forwarded_proto;
        uwsgi_pass                      unix:/tmp/app.sock;
        uwsgi_read_timeout              60;
        uwsgi_send_timeout              60;

The error occurs randomly. With the exact same payload, the first request could result in an error, but sending it again it could work fine. So I'm not exactly sure if increasing the timeout would help.

What I also don't understand are also these parts of the log messages:

22887#22887: *613 upstream timed out 

What does 22887#22887 mean? Why does it say *613 upstream ..? These numbers are always different per error message.

And ultimately, is there a way to reproduce this reliably and then fix it?

bow avatar Aug 31 '17 09:08 bow

Does the problem still happen when using the current release?

funkybob avatar Oct 04 '17 08:10 funkybob

I'm seeing the same thing with uWSGI 2.0.15, with the same setup. nginx in front of uWSGI vending a Python 3 Flask app.

dsully avatar Oct 10 '17 05:10 dsully

For the requests that don't get a timeout, how long are they typically taking? Are they close to the timeout?

Also, are there any logs on the uwsgi side?

As to the meaning of those numbers in the nginx logs, you'd probably need to ask the nginx folks :)

funkybob avatar Oct 10 '17 05:10 funkybob

Not close at all - timeout is 60 seconds. Most requests are serviced in under 1 second.

I see uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 296] in the uWSGI logs.

dsully avatar Oct 10 '17 15:10 dsully

@funkybob , I haven't switched to a new version. I did eventually increase my socket-timeout and harakiri config values both to 120. I can't say whether this fixed the problem per-se, since I already noticed the problem occuring fewer times even before I did the update. I can say that I'm not seeing the errors anymore (though I am still not 100% sure given that I don't exactly know what the cause was).

This particular error is only ever triggered when one of my POST endpoints is hit (from another, upstream service). This endpoint is hit relatively few times compared to others. I'm not sure if I have response time data from that service unfortunately. I can say that the payload from that upstream service was received completely as I see this application's database updated ~ which makes the reason for erroring out more puzzling to me.

On the uwsgi side, I'm seeing a lot of this:

SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request [POST endpoint]

around the time the error occured.

I also see @dsully's uwsgi error message, but on a different endpoint.

bow avatar Oct 11 '17 12:10 bow

The "uwsgi_response_writev_headers_and_body_do" message comes when uWSGI is trying to send the headers and body of the response in one go.

It is not, it seems, a timeout - that's caught later in the same function.

So, yes, your app has handled the request, and returned a response for uWSGI to pass on, but something went wrong.

Would be interesting to know exactly how long uWSGI is taking processing that request. I wrote a patch for someone recently that would log request details as it was parsed coming in...

funkybob avatar Oct 11 '17 21:10 funkybob

@bow that looks like a regular "the browser went away" message... most common cause is the user hit refresh, but can also be from XHR cancelling the request. There are ways to tell nginx to signal uwsgi about this, but to have your app notice and handle it well is a little tricky.

funkybob avatar Oct 11 '17 21:10 funkybob

This seems very similar to the timeouts I get in #1602.

kylemacfarlane avatar Oct 17 '17 17:10 kylemacfarlane

Any advancements on this?

DiegoGallegos4 avatar Dec 12 '17 18:12 DiegoGallegos4

I have the same issue. So far I didn't solve it properly, but I found that when I set processes to 1, problem is gone.

My stack:

python 2.7.13
debian 9
nginx 1.10.3
uwsgi 2.0.14-debian

kmadac avatar Jan 21 '18 06:01 kmadac

Well, in my case I discover uwgsi is very cpu hungry so I had to increase the vcpus of my instance.

DiegoGallegos4 avatar Jan 21 '18 16:01 DiegoGallegos4

Well, I haven't set number of vcpus, but number of uwsgi workers to 1 in app uwsgi .ini file. You have 5 in your configuration. I know it is not wanted configuration if you have more cpus, but could you please try it to check whether it fixes 'broken pipe' issue.

kmadac avatar Jan 21 '18 16:01 kmadac

I am facing the same issue. Has anyone found a solution?

dbelieves avatar Feb 01 '18 15:02 dbelieves

Hi guys I think we had the same problem that uwsgi closed the connection but continued processing and setting/extending the http-timeout fixed it for us (http://uwsgi-docs.readthedocs.io/en/latest/Options.html#http-timeout).

KoleS46 avatar Feb 26 '18 14:02 KoleS46

I am facing the same issue after I upgraded my app from py2 to py3, with the same nginx and ini files.

OS: CentOS release 6.6 (Final)
Python: 2.7.13 to 3.6.4
uWSGI: 2.0.14 and above
Tue Feb 27 16:56:18 2018 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/some-api/ (ip xxx) !!!
Tue Feb 27 16:56:20 2018 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/some-api/ (ip xxx) !!!
Tue Feb 27 16:56:26 2018 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/some-api/ (ip xxx) !!!
Tue Feb 27 16:56:26 2018 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /api/some-api/ (ip xxx) !!!
...
# ini file
[uwsgi]
master = true
chdir = /home/yyy/xx
module = xx.wsgi:application
processes = 20
socket = 0.0.0.0:8001
stats = 0.0.0.0:8002
# clear environment on exit
vacuum = true
max-requests = 500
#chmod-socket = 666
daemonize=/home/logs/uwsgi.log
pidfile=/home/uwsgi.pid
# Otherwise there'll be lots of requests logs.
disable-logging=true
# enlarge request query size
buffer-size = 655350
socket-timeout = 300
http-timeout = 300
harakiri = 300

I've tried socket-timeout, http-timeout and harakiri, but no luck...

# nginx.conf
server {
        listen       80;
        server_name  x.y.z.com;
        #access_log  logs/host.access.log  main;

        # for larger client request header
        large_client_header_buffers 4 1096k;

        # for modifier, plz refer to 
        # https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms
        location ^~ /static/ {
                alias /home/ucmt/dv/static/;
        }
        location ~ /(.*)$ {
                include uwsgi_params;
                uwsgi_read_timeout 600s;
                uwsgi_pass ll_uwsgi_backend;
        }
}
....

Please kindly help me out, thx

ace-han avatar Feb 27 '18 09:02 ace-han

Like the other people above me, I too was pulling my hair out as to why my requests were timing out.

My symptoms are as follows. At first, everything would work as expected. But as soon as I hit my endpoints a couple of times a second, it just locks up, and Nginx serves me 502 Bad Gateway errors. Only after restarting the uwsgi workers, I can accept requests again. Like the others, I see the following errors:

SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected)
uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 296]

The suggestion by @kmadac does wonders, however! Setting processes = 1 in my uWSGI configuration file resolved the problem of my endpoints locking up/timing out. Interestingly enough, the errors keep occurring. So I'm not sure to what extend the error messages are part of the problem. It is furthermore good to note that for me, the problem occurred regardless of whether a GET or POST request was being executed.

I understand that this certainly won't be an appropriate fix for everyone, but it might be a good workaround for others until the underlying issue is found and fixed.

My setup:

  • Ubuntu 16.04
  • Python 3.6.4
  • uWSGI 2.0.15
  • Nginx 1.10.3

mvdpanne avatar Mar 02 '18 23:03 mvdpanne

Unfortunately, after testing the 'workaround' some more, the problem started appearing again. So it does not go away; it happens less frequently. To my perception it occurs randomly.

mvdpanne avatar Mar 05 '18 22:03 mvdpanne

The following worked for me as work around, as I am temporarily downloading a huge file, might be the case for others.

  • uwsgi ini: http-timeout = 3600000
  • nginx conf: uwsgi_max_temp_file_size 20480m;

redwud avatar Mar 09 '18 07:03 redwud

I met the same problem. But, Some python code works well. and Only One python code occur that problem. that code use gensim , keras, tensorlfow library for deep learning serving. and , word2vec file size is almost 500mb. and deep learning model size is almost 600mb. that huge file is loaded only one time at first app initial time.

but, same code works well in case of sigle statnd alone Flask mode. same code occur that error at the case of Nginx + uwsgi + flask.

hoondongkim avatar Mar 21 '18 05:03 hoondongkim

For me the issue was from Redis, just on pages when Redis was called.

adibalcan avatar Mar 21 '18 07:03 adibalcan

same for me, keras uwsgi flask = 504

canivel avatar Apr 24 '18 18:04 canivel

I'm seeing the same issue with an nginx -> uwsgi -> django configuration, all hooked up in a docker-compose network.

Here's the trio of messages I'm seeing in the logs:

  1. SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /my/transaction/hook/ (ip remote-ip) !!!
  2. uwsgi_response_writev_headers_and_body_do(): Broken pipe [core/writer.c line 287] during POST [/my/transaction/hook/] (remote-ip)
  3. IOError: write error

I then also see my nginx container returning a 499 (Client closed the connection) response code.

And my config:

  • Ubuntu 16.04.2
  • Python 2.7.9
  • uWSGI 2.0.7
  • Nginx 1.9.15
  • docker 17.05.0
  • docker-compose 1.9.0
  • Postgres 9.4.15

It's probably also worth noting that I saw an uptick in these exceptions after moving my database from the EC2 it shared with my application on to an external RDS.

tannart avatar Jun 15 '18 09:06 tannart

@MacFarlaneBro does anything improve if you update to a current release of uWSGI?

funkybob avatar Jun 15 '18 10:06 funkybob

@funkybob I'll try that out and report back

tannart avatar Jun 15 '18 10:06 tannart

Exactly the same problem using uwsgi with unix socket:

python 3.5.5 / python 3.6.3 uWSGI 2.0.17 (also tested with uWSGI 2.0.13.1) nginx 1.12.2 django 2.0.6

Our projects are deployed on a k8s cluster. I have the same problem in all the deployed python 3 projects. No problem verified in others web projects using:

python 2.7.14 uWSGI 2.0.17 nginx 1.12.2 django 1.x

faxioman avatar Jun 19 '18 11:06 faxioman

@funkybob So I upgraded to uWSGI 2.0.17 a couple of days ago and I'm still seeing the issue.

tannart avatar Jun 19 '18 11:06 tannart

I'm experiencing something similar with uwsgi 2.0.17 behind nginx 1.10.3 (communicating with a socket file, uwsgi protocol), running in a Debian 9 Docker container, with Python 3.6.5. In my case the first requests sometimes work. After some random amount of time (or a condition that I haven't been able to identify; has even happened that no requests ever work), all further requests hang. Nginx times out after the configured uwsgi_read_timeout and logs this:

2018/06/20 00:38:05 [error] 7#7: *218 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <REDACTED_IP>, server: , request: "POST /tokenize HTTP/1.1", upstream: "uwsgi://unix:/tmp/uwsgi.sock", host: "<REDACTED_URL>:5000"

But uwsgi is not logging anything at all for those requests (the initialization messages do come up).

alexvy86 avatar Jun 20 '18 01:06 alexvy86

@adibalcan might be on to something when they mention Redis. The two possibly related bugs I opened (#1602 and #1716) both use Redis. In fact the OSError in #1716 is coming from Redis.

@MacFarlaneBro says that the problem became worse when switching to remote RDS but I find the opposite. When I develop locally with Redis/Postgres I constantly get timeouts even as a single user but it's pretty rare in production with Elasticache/RDS until the server comes under heavier load.

kylemacfarlane avatar Jun 20 '18 01:06 kylemacfarlane

@kylemacfarlane I solved the problem, my Redis connexion lasts just 5 minutes, after my app tries to connect at Redis but takes a very long time(maybe infinite) and the server responds with timeout.

adibalcan avatar Jun 20 '18 06:06 adibalcan

@adibalcan Are you sure you were getting the same timeout error as everyone else? That sounds completely different to what others are reporting.

My problem (and I assume most others here) is that writing to the web client connected to uWSGI "times out" immediately. I'm not getting any timeouts when connecting to Redis, Postgres, etc.

The reason I bring up Redis is because in #1716 if I suppress the OSError from Redis then it turns out that the web client is dead too which is similar to this bug. To me it seems like a common thread in all the issues I'm having is that uWSGI loses track of open sockets and it may be an incompatibility with redis-py or psycopg2 which causes this. This would also explain why both I and @MacFarlaneBro see a difference (even if opposite) with local vs remote cache/db servers.

kylemacfarlane avatar Jun 20 '18 17:06 kylemacfarlane