haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

Increased sD requests after upgrade to 2.3.19 (from 2.3.14)

Open thriqon opened this issue 3 years ago • 6 comments

Detailed Description of the Problem

Since the update from 2.3.14 to 2.3.19, we've seen increased terminations with status sD in HTTP mode. All these connections have a response code set, AFAICS this means that a response is already partially received. In the backend, this request is also logged (with a much smaller duration, several orders of magnitude below the server timeout).

Most of the requests are carried over HTTPS with (sometimes thousands) previous requests.

It seems to correlate weakly with relatively big responses.

Expected Behavior

No change in correlation with update.

Steps to Reproduce the Behavior

We've been unable to reproduce this issue.

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

In front of this HAproxy instance, we have another HAproxy in TCP-Mode, and this also logs breaking connections in `sD` state.

Config:

global
    nbthread 2
    cpu-map auto:1/1-2 0-1
    stats socket /state/haproxy-stats.sock level admin expose-fd listeners process all
    pidfile /state/haproxy.pid
    uid 2000
    gid 2000
    maxconn 200000
    tune.ssl.default-dh-param 2048
    max-spread-checks 5s
    master-worker
    log stderr format short daemon emerg warning

defaults
    log stdout len 10000 format raw daemon info info
    log-format '[..]'
    fullconn 500
    maxconn 10000
    option dontlognull
    option h1-case-adjust-bogus-server
    grace 20000
    #option http-server-close
    option http-keep-alive
    unique-id-format %{+X}o%ci:%cp:%fi:%fp_%rt_%Ts
    unique-id-header X-Unique-ID
    timeout check           10s
    timeout client          50s
    timeout client-fin      50s
    timeout connect         5s
    timeout http-keep-alive 5s
    timeout http-request    5s
    timeout queue           5s
    timeout server          1200s
    timeout server-fin      50s
    timeout tunnel          1h

frontend api
  mode http

  bind *:20024 ssl alpn h2,http/1.1 accept-proxy crt /api-ssl/sslparams.pem [...]

  capture request header Host len 40
  capture request header User-Agent len 150
  capture request header Referer len 100
  capture request header Origin len 150

  http-request set-var(req.backend) var(txn.alias),map(/config/current/alias.maps,_backend_404)
  use_backend app-%[var(req.backend)]
  
 [...]

 backend app-c130440470
  mode http
  http-response set-var(txn.cuid) str(c130440470)
  server srv_1 10.101.116.114:80 no-check

Output of haproxy -vv

HA-Proxy version 2.3.19-0647791 2022/03/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2022.
Known bugs: http://www.haproxy.org/bugs/bugs-2.3.19.html
Running on: Linux 5.10.60.1-microsoft-standard-WSL2 #1 SMP Wed Aug 25 23:20:18 UTC 2021 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with the Prometheus exporter as a service
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services : prometheus-exporter
Available filters :
        [SPOE] spoe
        [CACHE] cache
        [FCGI] fcgi-app
        [COMP] compression
        [TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

Customers receive connection closures that were expected to be kept open (due to keep-alive).

thriqon avatar Mar 23 '22 10:03 thriqon

Hello,

do you know if your users have noticed anything ? I'm wondering if it's a mistaken status when producing the log report or if something in the lower layers really provokes a timeout (e.g. part of the response being bufferred and not delivered for whatever reason).

wtarreau avatar Mar 23 '22 10:03 wtarreau

Yes, our users notified us that they received exceptions. The exceptions indicated an unexpected connection closure when the client expected to be able to use this connection again (keep-alive).

thriqon avatar Mar 23 '22 11:03 thriqon

Interesting, so that means that at least the content was delivered but that haproxy thought there was a late problem on the connection. That still doesn't tell us where the problem is but it helps narrow it down a little bit.

wtarreau avatar Mar 23 '22 13:03 wtarreau

Note that we've recently fixed two issues related to truncation/missing end with chunks. The code is still in 2.6-dev only at the moment but once we backport that you could be interested in giving it a try.

wtarreau avatar Apr 14 '22 19:04 wtarreau

Also a point I wanted to mention is that as you've likely noticed, 2.3 is reaching end-of-life (was planned Q1 2022), so there will likely be a 2.3.20 to close pending issues and no more 2.3 after. It might be time to give 2.4 a try and still have the option to roll back in case of trouble before it's too late.

wtarreau avatar Apr 20 '22 05:04 wtarreau

FYI, the 2.4.17 was released few days ago. Give it a try to be sure the issue is still there. The 2.3.20 was also released, But, the 2.3 is no longer maintained. So if there is no proof that the bug exists on a maintained version, we will not work on it.

capflam avatar May 17 '22 16:05 capflam

any news about this issue ?

capflam avatar Aug 25 '22 10:08 capflam

I'm closing because of inactivity. Feel free to reopen if with more info if necessary. Thanks !

capflam avatar Sep 12 '22 08:09 capflam