librdkafka icon indicating copy to clipboard operation
librdkafka copied to clipboard

Consider supporting per-socket TCP Keep-Alive settings: `TCP_KEEPIDLE`, `TCP_KEEPINTVL`, `TCP_KEEPCNT`

Open nickwb opened this issue 2 years ago • 3 comments

Description

First of all, thanks for the existing support of TCP Keep-Alive through the socket.keepalive.enable configuration.

I am in a similar position to those here: https://github.com/edenhill/librdkafka/issues/3109, and I'm struggling to get my producers to retain persistent connections once they go idle.

I'd like to request additional support for the TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT setsockopt options, which are supported on many (most?) operating systems, and enable the keep-alive to be tuned on a per-socket basis.

Note that IPPROTO_TCP must be used instead of SOL_SOCKET when calling setsockopt with these options.

  • TCP_KEEPIDLE means "How long must the socket be idle before we start sending keep-alive probes"
  • TCP_KEEPINTVL means "How long between each keep-alive probe"
  • TCP_KEEPCNT means "After how many unacknowledged probes will we consider the connection dead"

Background

Currently, users of socket keepalive must rely on their operating system to tune the parameters of the keep-alive.

Linux, for example, has the following sysctl settings:

  • net.ipv4.tcp_keepalive_time (Same meaning as TCP_KEEPIDLE)
  • net.ipv4.tcp_keepalive_intvl (Same meaning as TCP_KEEPINTVL)
  • net.ipv4.tcp_keepalive_probes (Same meaning as TCP_KEEPCNT)

Using the operating system configurations isn't always ideal:

  • Changing the setting can affect other connections on the system that you did not intend on altering
  • Changing the setting isn't always possible. Application developers do not always tightly control their deployment environments.
  • Changing the setting for a docker/container deployment is not straightforward:
    • In Kubernetes, the necessary sysctl settings are (usually) considered unsafe, and therefore require you to deploy a privileged container, plus a non-standard kubelet configuration on your hosts.
    • Fully managed Kubernetes services, like Amazon's EKS, Azure's AKS, or Google GKE tend to encourage standardised host node images that put you at arm's length from the necessary configuration.
  • The default for tcp_keepalive_time a.k.a. TCP_KEEPIDLE is set to 2 hours on many configurations, which is far too large for most modern middle-boxes

Platform Support

TCP_KEEPIDLE, TCP_KEEPINTVL, and TCP_KEEPCNT are not part of the POSIX standard for setsockopts, but they do seem to be fairly broadly supported on most operating systems.

Here's some references to the various docs:

Mac OS

Mac OS supports TCP_KEEPINTVL and TCP_KEEPCNT, but uses TCP_KEEPALIVE in place of TCP_KEEPIDLE.

https://github.com/apple/darwin-xnu/blob/a1babec6b135d1f35b2590a1990af3c5c5393479/bsd/netinet/tcp.h#L215

Older Windows

WSAIoctl and SIO_KEEPALIVE_VALS seem to be supported as far back as Windows 2000, though that API does not have an equivalent of TCP_KEEPCNT.

Note that these are also set in milliseconds, whereas TCP_KEEPIDLE, TCP_KEEPINTVL are usually set in seconds.

nickwb avatar May 18 '22 12:05 nickwb

I'm struggling to get my producers to retain persistent connections once they go idle.

how long do the connections stay open for after becoming idle? is the issue simply that you need to configure connections.max.idle.ms on the broker?

mhowlett avatar Sep 21 '22 19:09 mhowlett

For systems that require a low latency on write a keep-alive IMHO is the better solution. The connections.max.idle.ms will close the socket, requiring a new TLS handshake on the partition leader node before write. I agree with @nickwb that allow configure TCP_KEEPIDLE and TCP_KEEPINTVL will help a lot.

robsonpeixoto avatar Sep 21 '22 21:09 robsonpeixoto

TCP-level keepalives will not keep the connection alive with load-balancers or the broker's idle timeouts, since they are on the application level and need kafka requests to restart the idle timers, so I'm not sure adding additional TCP keepalive settings will help in practice.

The Kafka protocol, unfortunately, does not provide a "keepalive" request.

edenhill avatar Oct 10 '22 14:10 edenhill