haproxy
haproxy copied to clipboard
DNS retries doesn't seem to be working
Detailed Description of the Problem
We have a tcp passthrough to a DNS server, this caused us no issues until it's IP changed which required a restart
I added DNS resolvers to the config but that also failed during a failover
I then increased the retries to a high number in case it was taking longer than the defaults to switch IPs
However within an hour and no changes to the DNS the server disconnected and refused to connect again
Server back_d01/external-ingress-test is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
backend back_d01 has no server available!
Expected Behavior
Connection to be reestablished after DNS becomes available
Steps to Reproduce the Behavior
Without resolvers, change the IP of the destination
With resolvers, do nothing...
Do you have any idea what may have caused this?
No response
Do you have an idea how to solve the issue?
No response
What is your configuration?
global
pidfile /apps/haproxy/etc/haproxy.pid
# Send logs to standard error rather than syslog
log stderr local0
# Enable master/child setup
master-worker no-exit-on-failure
# Create an administration socket
stats socket /apps/haproxy/bin/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
# Force log-running connection to be closed when a shutdown is requested
hard-stop-after 30s
# Set the maximum number of concurrent connections
maxconn 2048
# Required for modern certificates
tune.ssl.default-dh-param 2048
# Enforce TLS v1.2 and above by default
ssl-default-bind-options ssl-min-ver TLSv1.2
# The current list of "strong" ciphers + plus allowing Java 7
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256
defaults
# Send logs to standard output rather than syslog
log stdout format raw daemon
# Use Layer 7 (HTTP) and log the requests
mode http
option httplog
# Set the timeouts (in milliseconds)
timeout connect 5000
timeout client 30000
timeout server 30000
timeout tarpit 3000
# When doing hostname lookups try to use DNS first, then the last known
# value and finally a null value
default-server init-addr libc,last,none
Original config:
frontend proxyfront
mode tcp
option tcplog
bind *:443
# use tcp content accepts to detects ssl client and server hello
acl clienthello req_ssl_hello_type 1
acl serverhello rep_ssl_hello_type 2
tcp-request inspect-delay 2s
tcp-request content capture req.ssl_sni len 50
log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq sni_hdr:%[capture.req.hdr(0)]"
tcp-request content accept if clienthello
acl d01_src src -f /apps/was/haproxy/extra-configs/d01_src.txt
use_backend back_d01 if d01_src
backend back_d01
mode tcp
server external-ingress-test dns.com:443 check verify none
Updated config:
resolvers ourdns
nameserver dns1 10.2.3.4:54
nameserver dns2 10.2.3.5:54
nameserver dns3 10.3.2.1:54
nameserver dns4 10.3.2.2:54
resolve_retries 600
timeout retry 10s
hold valid 30s
frontend proxyfront
mode tcp
option tcplog
bind *:443
# use tcp content accepts to detects ssl client and server hello
acl clienthello req_ssl_hello_type 1
acl serverhello rep_ssl_hello_type 2
tcp-request inspect-delay 2s
tcp-request content capture req.ssl_sni len 50
log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq sni_hdr:%[capture.req.hdr(0)]"
tcp-request content accept if clienthello
acl d01_src src -f /apps/haproxy/extra-configs/d01_src.txt
use_backend back_d01 if d01_src
backend back_d01
mode tcp
server external-ingress-test dns.com:443 check resolvers ourdns verify none
Output of haproxy -vv
./haproxy -vv
HAProxy version 2.6.2-16a3646 2022/07/22 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.2.html
Running on: Linux 4.18.0-372.13.1.el8_6.x86_64 #1 SMP Mon Jun 6 15:05:22 EDT 2022 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O3 -march=haswell -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -fPIE -fstack-protector-all -fno-strict-aliasing -D_FORTIFY_SOURCE=2
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_LIBCRYPT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_PROMEX=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1k FIPS 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-10)
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : prometheus-exporter
Available filters :
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace
Last Outputs and Backtraces
No response
Additional Information
No response
Could you confirm there is no request sent to any of your 4 nameservers ? When the IP change occurs, could you provide 2 or 3 outputs of show resolvers commands one minute apart. The message you saw is displayed when no response was received from any nameservers after the configured number of retries.
It may be related to #1286. Could you try DNS over TCP to perform your dynamic resolutions ?
FYI, the 2.6.8 was released. Since the 2.6.2, several bugs were fixed about the resolvers. I'm closing the issue. But if you are still hit by the issue, feel free to reopen it. Thanks !