vertica-python icon indicating copy to clipboard operation
vertica-python copied to clipboard

enabling TCP keepalive to avoid connections getting closed

Open traffetseder opened this issue 6 years ago • 12 comments

We ran into an issue with NAT gateway closing connections for long running queries that do not stream data back to the client. To fix this, we enabled TCP keepalive on the socket.

I haven't created a PR because this is linux specific code, but wanted to add it here to see if the community finds it useful.

See diff below.

-Thomas

--- a/vertica_python/vertica/connection.py
+++ b/vertica_python/vertica/connection.py
@@ -111,6 +112,20 @@ class Connection(object):
         self.transaction_status = None
         self.socket = None

+    def set_keepalive_linux(self, sock, after_idle_sec=60, interval_sec=60, max_fails=10):
+        """Set TCP keepalive on an open socket.
+        It activates after after_idle_sec of idleness,
+        then sends a keepalive ping once every interval_sec,
+        and closes the connection after max_fails failed ping ()
+        """
+        sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec)
+        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)
+
     def _socket(self):
         if self.socket is not None:
             return self.socket
@@ -121,9 +136,13 @@ class Connection(object):
         raw_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
         raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
         if connection_timeout is not None:
             raw_socket.settimeout(connection_timeout)
         raw_socket.connect((host, port))

+        self.set_keepalive_linux(sock=raw_socket, after_idle_sec=60, interval_sec=60, max_fails=10)
+
         ssl_options = self.options.get('ssl')
         if ssl_options is not None and ssl_options is not False:
             from ssl import CertificateError, SSLError

traffetseder avatar May 23 '18 00:05 traffetseder

I think I am having exactly this issue but I cannot tell. I have a long running query and after ~5 minutes, it stops with a failure: Unhandled error while executing query [Errno 32] Broken pipe Using vertica-python in --> dbt-vertica package.

ericwoodall avatar Feb 18 '21 19:02 ericwoodall

@ericwoodall It's hard to tell the problem from your description, here are some of directions for debug:

The latest vertica-python already set TCP keepalive as default: Source code, except you cannot config keepalive ping parameters (TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) like the code above. Therefore TCP keepalive should not be the root of your problem if you are using latest version of vertica-python. In addition, vertica-python provides connection_timeout as a connection parameter to control socket operations timeout (see README), queries can be affected by this. Please note that dbt-vertica is a third-party library that we don't support. If you want to set connection_timeout, you may need to change your libraries source code.

On the other hand, a Broken Pipe error could mean a lot of things. It is possible that your Vertica server closes this connection. So I think it's necessary to check the query itself as well.

sitingren avatar Feb 19 '21 15:02 sitingren

@sitingren thanks for the reply! I feel confident that it is not the query because I can execute the same query in DBViz and it returns data. I am going to install the latest version of both vertica-python and dbt-vertica and see if this fixes it. Will report back. Thanks again!

ericwoodall avatar Feb 19 '21 16:02 ericwoodall

@sitingren I installed the latest version of vertica-python (1.0.1) and I am still getting the same broken pipe error. Also, I verified that it is not the query, as I did a table insert [using the same query] and it succeeded.

ericwoodall avatar Feb 19 '21 17:02 ericwoodall

@ericwoodall Is DBViz also using vertica-python? Have you toggled connection_timeout setting in dbt-vertica? Since there is no traceback message or log to look at, I think running the query on vertica-python directly might narrow down the problem to be whether on dbt-vertica or vertica-python side.

sitingren avatar Feb 20 '21 03:02 sitingren

@sitingren , no DBViz is not using vertica-python but the query returns fine so it is not a server misconfiguration. I am in the process of creating a test python script that uses vertica-python directly. Will report back.

ericwoodall avatar Feb 20 '21 14:02 ericwoodall

+1 for Connection reset and [Errno 32] Broken pipe - same issue for 5+ minutes request to vertica from python code (not Dbt). Really irritating issue. Same code with different SQL works.

I am not expert but connection_timeout uses different method:

raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
raw_socket.settimeout(connection_timeout)

I have a code that have being broken for 1 year due to this error! Any ideas how set socket.SO_KEEPALIVE?

C-h-e-r-r-y avatar Apr 17 '23 16:04 C-h-e-r-r-y

@C-h-e-r-r-y What vertica-python version are you using? Have you checked out your server configs? In https://docs.vertica.com/12.0.x/en/sql-reference/config-parameters/general-parameters/, you can see parameters such as KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval.

sitingren avatar Apr 18 '23 04:04 sitingren

@sitingren Many thanks for quick reply!!!! I have tried different versions: 0.9.2, 1.2+ and 1.3.0

Vertica sever version is Vertica Analytic Database v9.2.0-0

SHOW CURRENT ALL; give a lot of values but KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval - are not present. :(

About serverside configs - good idea but I do not know what to try :(

C-h-e-r-r-y avatar Apr 22 '23 13:04 C-h-e-r-r-y

@sitingren Pleae note that connection for same SQL request works from Intellij Idea. E.g. looks like something wrong with python library or settings. Intellij Idea is run from Windows while python from Docker

C-h-e-r-r-y avatar Apr 22 '23 14:04 C-h-e-r-r-y

@C-h-e-r-r-y Please use the latest vertica-python version if possible. Have you set 'connection_timeout' in your connection options?

You can inspect the open socket once you create a Connection object:

import vertica_python
conn_info = {'host': 'xxx', 'user': 'xxx', ..., 'connection_timeout': 10}

with vertica_python.connect(**conn_info) as conn:
  print(conn.socket.gettimeout())  # set by 'connection_timeout'
  print(conn.socket.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE))  # should be 1
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE))
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL))
  print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT))

sitingren avatar Apr 24 '23 09:04 sitingren

My steps and result of investigations:

  1. I have tried the 1.3.0 but connection_timeout or read_timeout did not help me.
  2. I have setup odbc and result was the same.
  3. I have setup jdbc via jaydebeapi and got more or less same behavior with timeout errors.
  4. Next I run python code (vertica-python version) in Docker with --sysctl net.ipv4.tcp_keepalive_time=120 --sysctl net.ipv4.tcp_keepalive_intvl=60 --sysctl net.ipv4.tcp_keepalive_probes=6 --sysctl net.ipv4.tcp_fin_timeout=30 and after that I got my code working.

I have not yet investigated this in detail. The only what I can say now is connection_timeout and read_timeout are "ignored" and to set up timeouts you have to set them on OS level.

Your post about KeepAliveIdleTime, KeepAliveProbeCount, KeepAliveProbeInterval - gave me right direction but set them from client's side. Again many thanks for this :)

C-h-e-r-r-y avatar Apr 25 '23 21:04 C-h-e-r-r-y