haproxy
haproxy copied to clipboard
Increased sD requests after upgrade to 2.3.19 (from 2.3.14)
Detailed Description of the Problem
Since the update from 2.3.14 to 2.3.19, we've seen increased terminations with status sD in HTTP mode. All these connections have a response code set, AFAICS this means that a response is already partially received. In the backend, this request is also logged (with a much smaller duration, several orders of magnitude below the server timeout).
Most of the requests are carried over HTTPS with (sometimes thousands) previous requests.
It seems to correlate weakly with relatively big responses.
Expected Behavior
No change in correlation with update.
Steps to Reproduce the Behavior
We've been unable to reproduce this issue.
Do you have any idea what may have caused this?
No response
Do you have an idea how to solve the issue?
No response
What is your configuration?
In front of this HAproxy instance, we have another HAproxy in TCP-Mode, and this also logs breaking connections in `sD` state.
Config:
global
nbthread 2
cpu-map auto:1/1-2 0-1
stats socket /state/haproxy-stats.sock level admin expose-fd listeners process all
pidfile /state/haproxy.pid
uid 2000
gid 2000
maxconn 200000
tune.ssl.default-dh-param 2048
max-spread-checks 5s
master-worker
log stderr format short daemon emerg warning
defaults
log stdout len 10000 format raw daemon info info
log-format '[..]'
fullconn 500
maxconn 10000
option dontlognull
option h1-case-adjust-bogus-server
grace 20000
#option http-server-close
option http-keep-alive
unique-id-format %{+X}o%ci:%cp:%fi:%fp_%rt_%Ts
unique-id-header X-Unique-ID
timeout check 10s
timeout client 50s
timeout client-fin 50s
timeout connect 5s
timeout http-keep-alive 5s
timeout http-request 5s
timeout queue 5s
timeout server 1200s
timeout server-fin 50s
timeout tunnel 1h
frontend api
mode http
bind *:20024 ssl alpn h2,http/1.1 accept-proxy crt /api-ssl/sslparams.pem [...]
capture request header Host len 40
capture request header User-Agent len 150
capture request header Referer len 100
capture request header Origin len 150
http-request set-var(req.backend) var(txn.alias),map(/config/current/alias.maps,_backend_404)
use_backend app-%[var(req.backend)]
[...]
backend app-c130440470
mode http
http-response set-var(txn.cuid) str(c130440470)
server srv_1 10.101.116.114:80 no-check
Output of haproxy -vv
HA-Proxy version 2.3.19-0647791 2022/03/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2022.
Known bugs: http://www.haproxy.org/bugs/bugs-2.3.19.html
Running on: Linux 5.10.60.1-microsoft-standard-WSL2 #1 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1
DEBUG =
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with the Prometheus exporter as a service
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2
fcgi : mode=HTTP side=BE mux=FCGI
<default> : mode=HTTP side=FE|BE mux=H1
<default> : mode=TCP side=FE|BE mux=PASS
Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace
Last Outputs and Backtraces
No response
Additional Information
Customers receive connection closures that were expected to be kept open (due to keep-alive).
Hello,
do you know if your users have noticed anything ? I'm wondering if it's a mistaken status when producing the log report or if something in the lower layers really provokes a timeout (e.g. part of the response being bufferred and not delivered for whatever reason).
Yes, our users notified us that they received exceptions. The exceptions indicated an unexpected connection closure when the client expected to be able to use this connection again (keep-alive).
Interesting, so that means that at least the content was delivered but that haproxy thought there was a late problem on the connection. That still doesn't tell us where the problem is but it helps narrow it down a little bit.
Note that we've recently fixed two issues related to truncation/missing end with chunks. The code is still in 2.6-dev only at the moment but once we backport that you could be interested in giving it a try.
Also a point I wanted to mention is that as you've likely noticed, 2.3 is reaching end-of-life (was planned Q1 2022), so there will likely be a 2.3.20 to close pending issues and no more 2.3 after. It might be time to give 2.4 a try and still have the option to roll back in case of trouble before it's too late.
FYI, the 2.4.17 was released few days ago. Give it a try to be sure the issue is still there. The 2.3.20 was also released, But, the 2.3 is no longer maintained. So if there is no proof that the bug exists on a maintained version, we will not work on it.
any news about this issue ?
I'm closing because of inactivity. Feel free to reopen if with more info if necessary. Thanks !