Set up KeepAliveTimeout
Hello,
I am currently facing an issue of 502 errors and the reason is mostly due to an issue with the connection between the ELB and the target. After checking the logs I am sure about to increase the Idle Timeout greater than the connection idle timeout of the Load balancer. For this I need to increase the KeepAliveTimeout and need to set it to 61secs as my Idle connection timeout for load balancer is 60sec.
I am attaching my wsgi file. Could you please help me if in case I need to add KeepAliveTimeout then the file is correct or not?
KeepAlive On KeepAliveTimeout 61 Timeout 60
Also, please suggest if it requires any additional param that I might be missing here?
How long is the longest response time for your HTTP web request handlers?
If the load balancer has a fixed 60 second timeout on requests anyway, there is no point running a request-timeout of 60 seconds.
Next thing is that a keep alive timeout of 60 seconds is generally a very bad idea. This may have been fine decades ago when internet speeds were slow, but these days I would never advise more than a few seconds, maybe 2 seconds or at most 5 seconds.
Also, Timeout in Apache is not a request timeout, but a timeout on blocking waiting for more data over a connection, which is not the same thing. So just because Timeout is set to a specific value doesn't mean you can't have requests which take longer, only that there must be some flow of data within that timeout period. This is all pointless though if the load balancer has a fixed overall maximum request time of 60 seconds.
So you need to validate what the 60 seconds load balancer timeout really is. Is it a fixed maximum request time, or is like Timeout and only causes dropping of connection if no traffic on socket after that time. Having a high keep alive timeout can likely trigger that, but as I said, having a high keep alive timeout is not good practice these days.
Thanks for the quick response. And please follow below ans:
How long is the longest response time for your HTTP web request handlers? The majority of our HTTP web request handlers respond well within a few seconds. However, to accommodate rare edge cases or longer-running operations, we’ve set the request-timeout in WSGIDaemonProcess to 60 seconds. In practice, over 99% of our requests complete in under 5 seconds.
What does the 60 seconds load balancer timeout really represent? The 60-second timeout on the AWS Application Load Balancer (ALB) is an idle timeout, not a total request timeout. This means the connection will be closed only if no data is transmitted in either direction for 60 seconds. It does not terminate requests that take longer than 60 seconds, as long as there is some data flow during that period. We’ve confirmed this behavior from AWS documentation and from observing that longer-running responses work fine when data is streamed or flushed within the 60s window.
mod_wsgi Daemon Processes: Yes, I'm using multiple WSGIDaemonProcess directives, each configured with request-timeout=60. Example: WSGIDaemonProcess myapp user=ec2-user threads=5 python-home=/my/venv request-timeout=60 These daemon processes handle requests from different API endpoints. The app logic typically completes well within 60 seconds, but I matched the request-timeout with the ALB timeout to avoid unexpected terminations. That said, based on your guidance and to reduce edge-case overlaps, I may consider reducing the request-timeout to 55s or lower too.
Error Origin: request_processing_time = 0.0001 target_processing_time = 0.0000 response_processing_time = -1 elb_status_code = 502 target_status_code = -
Based on the above points and to the original question, could you please just let us know if the wsgi file attached to this ticket where I have used KeepAliveTimeout to 61sec and KeepAlive On and Timeout 60, is this right or wrong? Need guidance in setting these configs only.
Thanks, Atul
I already mentioned that such a high value for keep alive timeout is not a good idea. There is usually no good reason to have a high keep alive timeout these days as internet speeds are much better than they used to be. I would set keep alive timeout to 2 seconds and at most 5 seconds.
As to request-timeout, you possibly do not understand how it works in practice when you have a multi threaded mod_wsgi daemon process. Quoting the docs:
request-timeout=sss
Defines the maximum number of seconds that a request is allowed to run before the daemon process is restarted. This can be used to recover from a scenario where a request blocks indefinitely, and where if all request threads were consumed in this way, would result in the whole WSGI application process being blocked.
How this option is seen to behave is different depending on whether a daemon process uses only one thread, or more than one thread for handling requests, as set by the threads option.
If there is only a single thread, and so the process can only handle one request at a time, as soon as the timeout has passed, a restart of the process will be initiated.
If there is more than one thread, the request timeout is applied to the average running time for any requests, across all threads. This means that a request can run longer than the request timeout. This is done to reduce the possibility of interupting other running requests, and causing a user to see a failure. So where there is still capacity to handle more requests, restarting of the process will be delayed if possible.
So in a multi thread daemon process the request timeout may not trigger at the actual timeout value specified as it is an average across all threads (in other words is based on accumulated concurrent request time across all threads). It is done this way to allow requests to still run longer if capacity exists.
Thus even though the load balancer may have 60 second timeout, the request timeout may trigger only at much greater value, which means load balancer will trigger a gateway timeout of its own before the request timeout triggers.
If 99 precent of requests complete within 5 seconds and any longer than that is regarded as abnormal, then you might consider instead setting request timeout to 10 seconds and not 60 seconds. With 5 threads per daemon process that means at worst a long running request which takes 50 seconds (average 10 seconds across five threads) would timeout and fail before the 60 second load balancer timeout, thus less likely to see gateway timeout and should see Python stack trace being logged on request timeout so you can debug why your requests are getting stuck and taking so long.
Hey Graham,
Thanks for sharing above points. I will defenitely check this and follow the same.
However, again sticking to the original question, could you please let me know if I want to change the KeepAliveTime (not to this much seconds) are the below keys correct / are any other keys I need to take care of?
- KeepAlive
- KeepAliveTimeout
- timeout
Thanks, Atul
I already said, try using:
KeepAliveTimeout 2
Even the Apache docs say that the default for modern Apache httpd server is 5 seconds. 60 seconds was only used way back in Apache httpd 2.0 when the internet was much slower. A lot of separate docs and blogs on the internet are out of date when they say 60 seconds.
- https://httpd.apache.org/docs/2.4/mod/core.html#keepalivetimeout
KeepAliveTimeout Directive
Description: Amount of time the server will wait for subsequent requests on a persistent connection Syntax: KeepAliveTimeout num[ms] Default: KeepAliveTimeout 5 Context: server config, virtual host Status: Core Module: core The number of seconds Apache httpd will wait for a subsequent request before closing the connection. By adding a postfix of ms the timeout can be also set in milliseconds. Once a request has been received, the timeout value specified by the Timeout directive applies.
Setting KeepAliveTimeout to a high value may cause performance problems in heavily loaded servers. The higher the timeout, the more server processes will be kept occupied waiting on connections with idle clients.
If KeepAliveTimeout is not set for a name-based virtual host, the value of the first defined virtual host best matching the local IP and port will be used.
If you correct request-timeout value of 10 seconds, using 60 seconds for Timeout and load balancer would be fine and would only be a fail safe anyway since request-timeout should kick in first anyway.
Not sure what else to tell you as have already said all this before in similar words.