haproxy icon indicating copy to clipboard operation
haproxy copied to clipboard

DNS retries doesn't seem to be working

Open FireBurn opened this issue 2 years ago • 2 comments

Detailed Description of the Problem

We have a tcp passthrough to a DNS server, this caused us no issues until it's IP changed which required a restart

I added DNS resolvers to the config but that also failed during a failover

I then increased the retries to a high number in case it was taking longer than the defaults to switch IPs

However within an hour and no changes to the DNS the server disconnected and refused to connect again

Server back_d01/external-ingress-test is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
backend back_d01 has no server available!

Expected Behavior

Connection to be reestablished after DNS becomes available

Steps to Reproduce the Behavior

Without resolvers, change the IP of the destination

With resolvers, do nothing...

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
   pidfile /apps/haproxy/etc/haproxy.pid

   # Send logs to standard error rather than syslog
   log stderr local0

   # Enable master/child setup
   master-worker no-exit-on-failure

   # Create an administration socket
   stats socket /apps/haproxy/bin/admin.sock mode 660 level admin expose-fd listeners
   stats timeout 30s

   # Force log-running connection to be closed when a shutdown is requested
   hard-stop-after 30s

   # Set the maximum number of concurrent connections
   maxconn 2048

   # Required for modern certificates
   tune.ssl.default-dh-param 2048

   # Enforce TLS v1.2 and above by default
   ssl-default-bind-options ssl-min-ver TLSv1.2

   # The current list of "strong" ciphers + plus allowing Java 7
   ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256

defaults
   # Send logs to standard output rather than syslog
   log stdout format raw daemon

   # Use Layer 7 (HTTP) and log the requests
   mode http
   option httplog

   # Set the timeouts (in milliseconds)
   timeout connect 5000
   timeout client 30000
   timeout server 30000
   timeout tarpit 3000

   # When doing hostname lookups try to use DNS first, then the last known
   # value and finally a null value
   default-server init-addr libc,last,none

Original config:

frontend proxyfront
   mode tcp
   option tcplog
   bind *:443

   # use tcp content accepts to detects ssl client and server hello
   acl clienthello req_ssl_hello_type 1
   acl serverhello rep_ssl_hello_type 2

   tcp-request inspect-delay 2s
   tcp-request content capture req.ssl_sni len 50
   log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq sni_hdr:%[capture.req.hdr(0)]"

   tcp-request content accept if clienthello

   acl d01_src src -f /apps/was/haproxy/extra-configs/d01_src.txt
   use_backend back_d01 if d01_src

backend back_d01
   mode tcp
   server external-ingress-test dns.com:443 check verify none


Updated config:

resolvers ourdns
   nameserver dns1 10.2.3.4:54
   nameserver dns2 10.2.3.5:54
   nameserver dns3 10.3.2.1:54
   nameserver dns4 10.3.2.2:54
   resolve_retries	600
   timeout retry	10s
   hold	valid		30s

frontend proxyfront
   mode tcp
   option tcplog
   bind *:443

   # use tcp content accepts to detects ssl client and server hello
   acl clienthello req_ssl_hello_type 1
   acl serverhello rep_ssl_hello_type 2

   tcp-request inspect-delay 2s
   tcp-request content capture req.ssl_sni len 50
   log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq sni_hdr:%[capture.req.hdr(0)]"

   tcp-request content accept if clienthello

   acl d01_src src -f /apps/haproxy/extra-configs/d01_src.txt
   use_backend back_d01 if d01_src

backend back_d01
   mode tcp
   server external-ingress-test dns.com:443 check resolvers ourdns verify none

Output of haproxy -vv

./haproxy -vv 
HAProxy version 2.6.2-16a3646 2022/07/22 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.2.html
Running on: Linux 4.18.0-372.13.1.el8_6.x86_64 #1 SMP Mon Jun 6 15:05:22 EDT 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O3 -march=haswell -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -fPIE -fstack-protector-all -fno-strict-aliasing -D_FORTIFY_SOURCE=2
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_LIBCRYPT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-10)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

No response

FireBurn avatar Aug 23 '22 09:08 FireBurn

Could you confirm there is no request sent to any of your 4 nameservers ? When the IP change occurs, could you provide 2 or 3 outputs of show resolvers commands one minute apart. The message you saw is displayed when no response was received from any nameservers after the configured number of retries.

capflam avatar Aug 25 '22 07:08 capflam

It may be related to #1286. Could you try DNS over TCP to perform your dynamic resolutions ?

capflam avatar Aug 25 '22 10:08 capflam

FYI, the 2.6.8 was released. Since the 2.6.2, several bugs were fixed about the resolvers. I'm closing the issue. But if you are still hit by the issue, feel free to reopen it. Thanks !

capflam avatar Jan 27 '23 14:01 capflam