apisix
apisix copied to clipboard
bug: access upstream 502
Current Behavior
We find that sometimes Apisix access upstream got 502 bad gateway error even if upstream exist. Error Rate:
- 1/42458 in 15 minutes
- 500+ times in 1 day
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
- access upstream through apisix
Environment
- APISIX version (run
apisix version): 3.2.2 - Operating system (run
uname -a): Linux apisix-9d7bb89b8-2cv6k 5.10.134-16.1.al8.x86_64 #1 SMP Thu Dec 7 14:11:24 UTC 2023 x86_64 GNU/Linux - OpenResty / Nginx version (run
openresty -Vornginx -V): openresty/1.21.4.1 - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info): 3.5.10 - APISIX Dashboard version, if relevant:
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version):
we encountered a similar issue previously.
The root of the problem was that APISIX was using HTTP/1.1 to proxy requests to the upstream server, and the keepalive_timeout was set to 60 seconds:
proxy_http_version 1.1;
keepalive_timeout 60s;
If the upstream server is configured with keepalive, such as Gunicorn in Python (which has a default keepalive_timeout of 2 seconds), this can lead to issues. In the APISIX <-> upstream setup, the upstream server closes the connection after 2 seconds. However, APISIX is unaware of this, and when it attempts to use this closed connection for the next request, it results in a 502 error.
Solutions to this issue include:
- Disabling keepalive on the upstream server.
- Or, setting the upstream server’s keepalive timeout to be greater than 60 seconds.
我们以前遇到过类似的问题。
问题的根源在于 APISIX 使用 HTTP/1.1 代理对上游服务器的请求,并且 keepalive_timeout 设置为 60 秒:
proxy_http_version 1.1; keepalive_timeout 60s;如果上游服务器配置了keepalive但是,例如Python中的Gunicorn(其默认keepalive_timeout为2秒),则可能会导致问题。在APISIX <->上游设置中,上游服务器会在2秒后关闭连接。,APISIX并不知道这一点,当它尝试关闭上面的连接用于下一个请求时,会导致 502 错误。
解决该问题的方法包括:
- 在上游服务器上禁用keepalive。
- 或者,将上游服务器的保持活动超时设置为大于60秒。
我们没法改业务,在upstream的配置上把idle_timeout改成0了,可惜效果不显著,还有502
@random-zhu
some other solutions you can try:
- use retries for upstream https://apisix.apache.org/docs/apisix/admin-api/#request-body-parameters-4
- use nginx proxy_next_upstream for http_502;