mod_wsgi Frequent 504 Gateway timeout issues

Hi Team,

We are using mod_wsgi to host my flask application on EC2 server. Recently we started seeing intermittent 504 gateway timeout issues for out APIs. On investigating the mod wsgi logs we found this error message:

[Fri Jun 03 14:13:36.294416 2022] [wsgi:error] [pid 23168] [client 10.235.52.42:29608] Timeout when reading response headers from daemon process 'revo6004rdhqa3': /var/www/app/6004/rdhqa3/service/deploy.wsgi, referer: <site_url> [Fri Jun 03 14:13:36.726029 2022] [wsgi:error] [pid 22804] (11)Resource temporarily unavailable: [client 10.235.52.42:29552] mod_wsgi (pid=22804): Unable to connect to WSGI daemon process 'revo6004rdhqa3' on '/run/httpd/wsgi.26225.1.5.sock' after multiple attempts as listener backlog limit was exceeded or the socket does not exist., referer: <site_url>

Python version: 3.7 Mod_wsgi version: mod-wsgi==4.9.0

Please find the wsgi configuration and error log files for your reference: wsgi.txt WSGI_LOG.txt

We are not able to reach to the actual root cause, can you please suggest some pointers what might be going wrong here or how we can debug the issue further.

Let me know if you need more information on this.

Thanks

Jun 03 '22 16:06 shishir-22

You are most likely using a Python package which isn't designed properly to work in Python sub interpreters. This is causing deadlocks or hangs resulting in all the request handler threads getting stuck. At times psycopg2 has been known to not work properly in Python sub interpreters. Major packages like numpy and anything using it are also a big problem.

Since you are using daemon mode for the WSGI application, force the use of the main Python interpreter context.

https://modwsgi.readthedocs.io/en/master/user-guides/application-issues.html#python-simplified-gil-state-api

Other possibilities are that things are getting stuck on backend services. Like when using database or email services.

You should explore use of the request-timeout option as a way to force restart daemon processes when you have requests that run much longer than should.

https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html?highlight=request-timeout

One thing this force restart mechanism will do is log stack traces of where Python threads are on process shutdown. This will help you work out where your application is getting stuck.

Jun 03 '22 21:06 GrahamDumpleton

Hi Graham,

Thanks for your response and suggestions.

We have made the above changes suggested by you and so far we have not observed any 504 errors from past 24 hours.

Can you please provide more details around "Python Simplified GIL State API" like is there any downside of using %{GLOBAL} group in comparison to what we were using earlier or it is completely safe to use this with multiple wsgi applications running in demon mode.

Regards Shishir

Jun 08 '22 11:06 shishir-22

The recommended method of deployment is to delegate each WSGI application to a distinct named daemon process group. Then in the respective daemon process groups force the WSGI application to run in the main Python interpreter context. Each WSGI application should be in different daemon process groups as many WSGI frameworks don't allow more than one application instance to run in the same interpreter context. The main Python interpreter context is the same as if you had run Python from the command line and is not a sub interpreter. Thus the main Python interpreter context is guaranteed to work with all Python modules irrespective of whether they use the Python simplified GIL API or not.

WSGIDaemonProcess app1
WSGIScriptAlias /app1 /some/path/app1.wsgi process-group=app1 application-group=%{GLOBAL}

WSGIDaemonProcess app2
WSGIScriptAlias /app2 /some/path/app2.wsgi process-group=app2 application-group=%{GLOBAL}

Jun 08 '22 21:06 GrahamDumpleton

mod_wsgi mod_wsgi copied to clipboard

Frequent 504 Gateway timeout issues

mod_wsgi
mod_wsgi copied to clipboard