haproxy 2.6.12/2.6.22 - Debian 11 - Erratic SNI forwarding on multi-certificate configuration

Detailed Description of the Problem

Hello,

Since deploying the Apache2 2.4.65 update, we’ve been experiencing unexplained, erratic 421 errors on about 0.03% of incoming requests (excluding the 0.02% of requests without SNI - those 421s are expected).

We are running the official version shipped with Debian 11: HAProxy 2.6.12.

The SNI values are correctly logged in HAProxy - they are properly provided by the clients in the audited cases:

log-format "%ci:%cp [%t] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %ts %ac/%fc/%bc/%sc/%rc %sq/%bq "%r" SNI=%[ssl_fc_sni]"

Sep 24 07:37:49 localhost haproxy[2092738]: 20.171.207.51:55588 [24/Sep/2025:07:37:48.251] https~ https/X 0/0/0/11/1191 421 453 - - -- 82/82/2/2/0 0/0 "GET X HTTP/2.0" SNI=www.hostname1.com => SNI OK

However, the Apache2 backend does not seem to receive this SNI forwarded via sni req.hdr(host),host_only in about 0.03% of cases. We are also logging the value received by Apache2, and the environment variable %{SSL_TLS_SNI} is empty:

20.171.207.51 - - [24/Sep/2025:07:37:49 +0200] "GET /X/ HTTP/2.0" 421 HOST:www.hostname1.com SNI:- (SNI EMPTY - www.hostname1.com expected)

After numerous failed attempts, we’re trying our luck here to see if this is a known bug in the official version shipped with Debian 11 (and Debian 12)

If someone can help us, we’re willing to pay for an audit.

Thank you for your help,

Have a nice day,

Vincent

Expected Behavior

We need SNI to be forwarded everytime.

Steps to Reproduce the Behavior

We have no scheme to reproduce this.

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

HaProxy 2.6.12 on Debian 11 + Apache2 2.4.65

Output of `haproxy -vv`

HAProxy version 2.6.12-1~bpo11+1 2023/04/01 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.12.html
Running on: Linux 5.10.0-35-amd64 #1 SMP Debian 5.10.237-1 (2025-05-19) x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -QUIC +RT +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=12).
Built with OpenSSL version : OpenSSL 1.1.1n  15 Mar 2022
Running on OpenSSL version : OpenSSL 1.1.1w  11 Sep 2023 (VERSIONS DIFFER!)
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Last Outputs and Backtraces

Additional Information

No response

Sep 24 '25 06:09 touchweb-vincent

Just to be sure, because it is not obvious for me. Are you sure there is a valid host header value in the request ? I'm asking because in the haproxy log, the host header value is not dumped. You can try to capture it with capture request header Host len 64 directive for instance.

Sep 24 '25 06:09 capflam

In addition the 2.6.12 is quite old. It could be good to eval the 2.6.22 (https://haproxy.debian.net/#distribution=Debian&release=bullseye&version=2.6)

Sep 24 '25 06:09 capflam

Hello @capflam, thanks for the quick reply. I have edited the logging format. I will update this thread once it iterates again.

Note that we did receive the correct Host header on the Apache2 backend (www.hostname1.com), and it is correctly logged. I have also updated the journal in the original thread.

Sep 24 '25 07:09 touchweb-vincent

Here is another log :

HAProxy :

Sep 24 09:19:43 localhost haproxy[2112253]: 40.77.179.34:18882 [24/Sep/2025:09:19:42.896] https~ https/X 0/0/0/12/180 421 453 - - -- 121/121/1/1/0 0/0 "POST /X HTTP/2.0" SNI=www.hostname1.com HOST=www.hostname1.com

Apache2 :

40.77.179.34 - - [24/Sep/2025:09:19:43 +0200] "POST /X HTTP/2.0" 421 HOST:www.hostname1.com SNI:-

Sep 24 '25 07:09 touchweb-vincent

Ok thanks. It seems there is something strange indeed. But the 2.6.12 is too old to investigate. You should update to 2.6.22. It does not mean this will fix your issue. But it is far easier (from my point of view of couse) than exploring more than 500 bug fixes to figure out if it is a know issue already fixed.

Sep 24 '25 07:09 capflam

I understand - we are upgrading from version 2.6.12 to 2.6.22 and I will get back to you in the coming days.

Thank you for your help.

Sep 24 '25 09:09 touchweb-vincent

We still have the issue on 2.6.22 :

HAProxy version 2.6.22-1~bpo11+1 2025/04/26 - https://haproxy.org/ Status: long-term supported branch - will stop receiving fixes around Q2 2027. Known bugs: http://www.haproxy.org/bugs/bugs-2.6.22.html Running on: Linux 5.10.0-35-amd64 #1 SMP Debian 5.10.237-1 (2025-05-19) x86_64 Build options : TARGET = linux-glibc CPU = generic CC = x86_64-linux-gnu-gcc CFLAGS = -O2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1 DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -QUIC +RT +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=12). Built with OpenSSL version : OpenSSL 1.1.1w 11 Sep 2023 Running on OpenSSL version : OpenSSL 1.1.1w 11 Sep 2023 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with Lua version : Lua 5.3.3 Built with the Prometheus exporter as a service Built with network namespace support. Support for malloc_trim() is enabled. Built with libslz for stateless compression. Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with PCRE2 version : 10.36 2020-12-04 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with gcc compiler version 10.2.1 20210110

Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll.

Available services : prometheus-exporter Available filters : [CACHE] cache [COMP] compression [FCGI] fcgi-app [SPOE] spoe [TRACE] trace

HaProxy Log :

Sep 24 15:17:05 localhost haproxy[2098914]: 86.241.147.74:59575 [24/Sep/2025:15:17:05.633] https~ https/https1 0/0/0/13/13 421 453 - - -- 5/5/0/0/0 0/0 "GET /X HTTP/2.0" SNI=www.hostname1.com HOST=www.hostname1.com

Apache2 log :

86.241.147.74 - - [24/Sep/2025:15:17:05 +0200] "GET /X HTTP/2.0" 421 WL:"0" HOST:www.hostname1.com SNI:-

0.025% 421 for the moment

I will try tomorrow on the 3.0 LTS branch.

Sep 24 '25 13:09 touchweb-vincent

Hello,

Since upgrading to 3.0.11, we’ve observed another strange SNI behavior (statistics coming on Sunday - currently <0.015%) related to the first certificate being served, which was not the case before, we only had rare problems on secondary certificates:

HAProxy: Sep 25 07:36:40 localhost haproxy[2300971]: X.142:57837 [25/Sep/2025:07:36:40.915] https~ https/X 0/0/0/22/64 421 453 - - -- 76/76/1/1/0 0/0 "POST X HTTP/2.0" SNI=www.hostname1.com HOST=www.hostname1.com (first certificate on the server – we never had any SNI issues with the first certificate on the 2.6 LTS branch)

Apache2: X - - [25/Sep/2025:07:36:40 +0200] "POST /X HTTP/2.0" 421 HOST:wwwhostname1.com SNI:-

Apache-Error: [file "ssl_engine_kernel.c"] [line 325] [level 3] AH02032: Hostname www.hostname2.com provided via SNI and hostname www.hostname1.com provided via HTTP have no compatible SSL setup

It’s as if everything is getting completely mixed up.

Sep 25 '25 05:09 touchweb-vincent

Thanks for the test ! I guess I'm starting to understand. At least the last error. The backend connection is reused for different requests, with different hostnames. However, because the SNI is part of the hash value to select an idle connection, my guess is that the client do the same. I means, it opens a connection with a SNI and the corresponding hostname for the first request and send a second request with a different hostname on the same connection. There is nothing illegal here from the moment the hostnames match a name in the certificate.

Here it is interesting to note the last error message suggest there is a SNI set for the connection (www.hostname2.com). But the log line just above pretend there was no SNI. So my guess is that apache logs an empty SNI for reused connections.

Sep 25 '25 06:09 capflam

Yes, we also suspected that at first. We reluctantly tweaked http-reuse never on the HAProxy side, but on the 2.6 branch we still observed the same erratic 421 errors.

It will be a pain if we have to reduce H2 multiplexing.

If you have any other ideas, we’re open to them.

Do you include in your update testing protocol a stack with Apache2 as the backend in HTTP mode?

Sep 25 '25 07:09 touchweb-vincent

Hum, after thinking about it twice, it remains strange. There is something unexplained because it should work at first glance.

Could you share you config to be sure ?

Then it could be good to add more information in your logs. In haproxy, could add the source port with %[bc_src_port]. You could also dump %[bc_reused] value. This could help to match corresponding Apache logs. For the same purpose, it could be good to add a unique identifier with following directives:

unique-id-format %[uuid()]
unique-id-header X-Unique-ID

You should then add %ID in your haproxy logs. And then dump this header value in your apache logs. The idea is to be able to track all requests sent on one server connection and have more information about the SNI used.

Sep 25 '25 09:09 capflam

Thanks you for the reply.

I cannot share these configurations, it's a common stack with Haproxy and Apache2 on separate servers with TLS 1.2/1.3 support on both sides (TLS everywhere) with multiples professional Sectigo certificates - we use http-mode for IDS considerations.

We have tried dozens of configurations permutations over 4 days, we still have erratic 421 errors on 3.0 LTS branch (3.0.11) - only on medium load server (~100 000 calls per hour) this behaviour has not been observed on low traffic server (which directly incriminates highly probably H2 - possibly on Apache2 side).

Once switching to 3.0 LTS, 421 have bumped to 0.2% of the traffic - x 10 on these medium load servers, which forces us to set again http-reuse never - erratic 421 drops to 0.02% of the traffic - quite less than the 2.6 LTS branch (2.6.22).

We cannot disable h2 because this causes other critical errors on TLS 1.3 and we cannot disable TLS 1.3.

[Fri Sep 26 08:34:45 2025] [trace3] [pid 59877] ssl_engine_kernel.c(2131): [client 212.83.148.203:48502] OpenSSL: Read: TLSv1.3 early data [Fri Sep 26 08:34:45 2025] [trace3] [pid 59877] ssl_engine_kernel.c(2155): [client 212.83.148.203:48502] OpenSSL: Exit: error in error

I don't know how to dump this : %[bc_src_port] / %[bc_reused]

These two tries generate critical errors :

log-format "%ci:%cp [%t] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %ts %ac/%fc/%bc/%sc/%rc %sq/%bq "%r" SNI="%[ssl_fc_sni]" HOST="%[var(txn.host)]" UID="%[var(txn.btw_uid)]" BC_SRC_PORT="%[bc_src_port]" BC_REUSED="%[bc_reused]""

failed to parse log-format : failed to parse sample expression <bc_reused]"> : unknown fetch method 'bc_reused'.

Or :

http-request set-var(txn.bc_src_port) %[bc_src_port] http-request set-var(txn.bc_reused) %[bc_reused]

log-format "%ci:%cp [%t] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %ts %ac/%fc/%bc/%sc/%rc %sq/%bq "%r" SNI="%[ssl_fc_sni]" HOST="%[var(txn.host)]" UID="%[var(txn.btw_uid)]" BC_SRC_PORT="%[var(txn.bc_src_port)]" BC_REUSED="%[var(txn.bc_reused)]""

error detected in backend 'https' while parsing 'http-request set-var(txn.bc_src_port)' rule : missing fetch method.

If you have any ideas.

Thanks you

Sep 26 '25 08:09 touchweb-vincent

Sorry, I checked and bc_reused is only available since 3.2. At least, you can dump the source port. With the uuid, it should help.

About your haproxy configuration, I guess you can remove all sensitive parts. Important information are server and default-server lines, from your defaults and backend section. The order is also important to be sure there is no strange combination. It is not impossible to find some uncovered cases in the configuration parsing. More generally, it could be useful to share all options configuring the backend and its servers.

Sep 26 '25 08:09 capflam

Another important point. If the issue still occurs when http-reuse never is set, it is not a bug about the lookup of idle connections. It eliminates a huge part of issues. I cannot explain why the issue is more visible when connections are reused. But at least we know it seems to happen on fresh connections.

Sep 26 '25 08:09 capflam

And to confirm and complete what Christopher said above, we don't need (and don't want you to share) anything confidential. Object names, IPs, filtering rules etc are irrelevant to the issue, so feel free to redact anything you don't want to share.

Sep 26 '25 08:09 wtarreau

OK here is a cleaned-up configuration, but as faithful as possible :

global

    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
    ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

defaults

    option abortonclose
    option httpclose
    option http-server-close

frontend https mode http

bind *:443 ssl crt-list /etc/ssl/private/X  alpn h2,http1.1

default_backend https

http-request set-var(txn.host) hdr(Host)

# added on 09/25/2025
http-request set-var(txn.sni) ssl_fc_sni,lower
http-request set-header Connection close if !{ var(txn.host) -m str %[var(txn.sni)] } !{ var(txn.sni) -m end %[var(txn.host)] }

log-format "%ci:%cp [%t] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %ts %ac/%fc/%bc/%sc/%rc %sq/%bq \"%r\" SNI=\"%[ssl_fc_sni]\" HOST=\"%[var(txn.host)]\" UID=\"%[var(txn.btw_uid)]\""

backend https mode http

http-request set-var(txn.uid) uuid()
http-request add-header X-UID %[var(txn.btw_uid)]

http-reuse never

server httpsX X:X ssl sni req.hdr(host),host_only verify none source X alpn h2,http1.1

Sep 26 '25 08:09 touchweb-vincent

Interesting to see that you already keep a copy of host in txn.host. This makes me think that in the case where any rule or Lua code (I don't know if you use any) would use set-uri, it could rewrite the host part and make it disappear from the headers while still present in the log. Similarly, if host_only() has a bug that would make it return an empty field, we couldn't see it here. Maybe something easy would be to keep a copy of the trimmed down host in your log, by pursuing on top of your existing variables:

   http-request set-var(txn.host) hdr(Host)
   http-request set-var(txn.hostonly) var(txn.host),host_only
   ...
   log-format "... hostonly=%[var(txn.hostonly)]"
   ...
   server httpsX X:X ssl sni var(txn.hostonly) verify none source X alpn h2,http1.1

This way we're certain to use the same input all along, and if Apache still sees empty SNI while the logs do not report an empty hostonly, it's certain that it's either the "sni" directive which occasionally fails, idle connection pickup which fails, or maybe even Apache having difficulties occasionally retrieving an SNI over a reused connection (but why if is works with reuse never?).

Sep 26 '25 09:09 wtarreau

I've also checked whether host_only() could fail. But I'm not seeing how. It calls sample_conv_str2lower() at the end, which in turn calls smp_make_rw(), which calls smp_dup() which could fail, but not for strings on input, which is what we have here. And I'm not seeing how str2lower could turn the entry to a shorter string than found on input for example.

Sep 26 '25 09:09 wtarreau

@wtarreau Thanks for your returns.

I have edited configurations accordingly :

HaProxy :

Sep 26 11:44:14 localhost haproxy[2669266]: 66.249.74.76:58084 [26/Sep/2025:11:44:12.298] https~ https/https1 3/0/18/23/2048 421 453 - - -- 117/117/2/2/1 0/0 "POST /X HTTP/1.1" SNI="www.hostname1.com" HOST="www.hostname1.com" HOSTONLY="www.hostname1.com" UID="1e3cfc98-a2a9-4f77-ad12-12918b54301d"

Apache2 :

66.249.74.76 - - [26/Sep/2025:11:44:14 +0200] "POST /X HTTP/2.0" 421 HOST:www.hostname1.com UID:"1e3cfc98-a2a9-4f77-ad12-12918b54301d" SNI:"-"

Apache2 auditlog :

--08ccac13-A-- [26/Sep/2025:11:44:14 +0200] aNZgbjdnfMHjqwsAzcDRkAAAjgI 66.249.74.76 48638 X X --08ccac13-B-- POST /X HTTP/2.0 Host: www.hostname1.com X-Uid: 1e3cfc98-a2a9-4f77-ad12-12918b54301d

--08ccac13-H-- Apache-Error: [file "ssl_engine_kernel.c"] [line 325] [level 3] AH02032: Hostname media.hostname2.com provided via SNI and hostname www.hostname1.com provided via HTTP have no compatible SSL setup

Note that in this case, the SNI hostname was not the first declared host but one configured within non-default vhosts, which confirms that it completely mixes things up in certain scenarios.

Sep 26 '25 10:09 touchweb-vincent

Thank you. For now I see no way how this could be explained and am very surprised it's reported only now given how serious this is, but at least we're certain to work with the same strings all along. We'll have to dig in the idle conns reuse code to figure what could explain this.

Sep 26 '25 10:09 wtarreau

Since this is directly related to Apache2 2.4.65, which was released less than two months ago - and less than a month ago on Debian 12 - with the freeze phases usually observed by managed service providers lasting from two weeks to a month, and given that it only affects a negligible fraction of traffic, it is very likely that very few people have noticed it - and perhaps even fewer have taken the time to address or report it.

It should also be noted that this only affects infrastructures with multiple professional certificates, which further reduces the number of potentially impacted cases.

Sep 26 '25 10:09 touchweb-vincent

You're right, these are indeed good points to keep in mind.

Sep 26 '25 11:09 wtarreau

I'm still puzzled with the fact that, with no reuse, the issue it still there. @touchweb-vincent, could you make another test disabling the reuse of connections ? It could be useful to inspect the apache audit log in that case.

With no reuse, only private connections, attached to the session, can be reused. If there is still another SNI provided, it means a private connection was erroneously reused.

Sep 26 '25 13:09 capflam

Honestly, this also bothers me a lot, erratic non-reproducible issues are really a pain, and the older we get, the less tolerance we have for them.

I didn’t understand your proposal: "could you make another test disabling the reuse of connections?" - what exactly are you expecting, please?

We are already using: option httpclose / option http-server-close and http-reuse never.

Sep 26 '25 13:09 touchweb-vincent

Ah. So the Apache audit above was emitted with these options. That's an important info. Because with these options, with an H1 connection on client side, it means keep-alive is disabled on client side and there is no reuse at all on server side (private or not). So it is normally impossible to get a server connection already opened with another SNI except if the expression is badly evaluated by HAProxy.

It remains possible the bug is on Apache side. Have you ever experienced this issue on an older Apache version ?

Sep 26 '25 13:09 capflam

We have always had a small residual amount of 421 errors, but it has never exceeded 0.0001% of the traffic, so we had never really paid attention to it.

Sep 26 '25 13:09 touchweb-vincent

Ok, so it is not a fully new issue. Back to the code then...

Sep 26 '25 13:09 capflam

Then at such error rates it would be useful to run a tcpdump between haproxy and the servers. For this you'll need to have the outgoing source port in haproxy logs in order to quickly discriminate the traffic, e.g:

http-request set-var(txn.bc_src_port) bc_src_port
log-format "%ci:%cp [%t] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Tt %ST %B %CC %CS %ts %ac/%fc/%bc/%sc/%rc %sq/%bq %r SNI=%[ssl_fc_sni] HOST=%[var(txn.host)] UID=%[var(txn.btw_uid)] BC_SRC_PORT=%[var(txn.bc_src_port)]"

Then with apache's audit log, the date and source port in haproxy, you can filter that, either using tcpdump or directly within wireshark. Looking at the TLS handshake, you should see what SNI is emitted. We'll see if there's none, an empty one, garbage, or a valid one.

Sep 26 '25 14:09 wtarreau

This Apache2 error log is quite telling:

--08ccac13-H-- Apache-Error: [file "ssl_engine_kernel.c"] [line 325] [level 3] AH02032: Hostname media.hostname2.com provided via SNI and hostname www.hostname1.com provided via HTTP have no compatible SSL setup

It could not have invented media.hostname2.com - it is not the default vhost, it is a deep vhost. There must necessarily have been an unwanted reuse of a connection in a special race condition.

The remaining question is whether the issue lies with HAProxy or with Apache2.

I encourage you to build a stack like that and then attack it with siege to see if you can reproduce these anomalies.

Sep 26 '25 14:09 touchweb-vincent

It could not have invented media.hostname2.com - it is not the default vhost, it is a deep vhost. There must necessarily have been an unwanted reuse of a connection in a special race condition.

I totally agree, and that's why we're trying to figure where it lies. We don't even know if it matches the first request of the connection or another one. We don't know if it happens on resumed connections or newly negotiated connections. It could even be possible that apache finds it in its tls context on resumed connections for example. There are lots of possibilities.

The remaining question is whether the issue lies with HAProxy or with Apache2.

For now we're assuming it's in haproxy though we can't find any relevant possibility in the code we've read and re-read many times. The possibility that it's in Apache is there as well, but similarly there are so many ways to (re)use a connection over TLS that the number of scenarios to imagine is complex. To give you an idea, I have even checked the connection retry code just in case it would only happen after a failure and a retry. But I noticed on one log you produced that the retry counter was zero.

I encourage you to build a stack like that and then attack it with siege to see if you can reproduce these anomalies.

The thing it that in our tests such setups are flawlessly run every day :-/ So our attempts to reproduce it will not suddenly make the problem appear if the biases we might have systematically result in creating a comparable setup which never exhibits it.

That's why being able to bisect where/when it happens would significantly help us orient our research. At the moment all we know is that the SNI that apache finds does not match the host in the request, while haproxy logs indicate the same are used. There's obviously something wrong that happens sometimes between the two regarding this. The fact that you mention that even with reuse-never it still happens but less frequently is quite shocking and confusing. For now we're missing any idea of what to try or to look for.

Sep 26 '25 14:09 wtarreau

2.6.12/2.6.22 - Debian 11 - Erratic SNI forwarding on multi-certificate configuration

Detailed Description of the Problem

Expected Behavior

Steps to Reproduce the Behavior

Do you have any idea what may have caused this?

Do you have an idea how to solve the issue?

What is your configuration?

Output of haproxy -vv

Last Outputs and Backtraces

Additional Information

global

Output of `haproxy -vv`