vertica-python
vertica-python copied to clipboard
enabling TCP keepalive to avoid connections getting closed
We ran into an issue with NAT gateway closing connections for long running queries that do not stream data back to the client. To fix this, we enabled TCP keepalive on the socket.
I haven't created a PR because this is linux specific code, but wanted to add it here to see if the community finds it useful.
See diff below.
-Thomas
--- a/vertica_python/vertica/connection.py
+++ b/vertica_python/vertica/connection.py
@@ -111,6 +112,20 @@ class Connection(object):
self.transaction_status = None
self.socket = None
+ def set_keepalive_linux(self, sock, after_idle_sec=60, interval_sec=60, max_fails=10):
+ """Set TCP keepalive on an open socket.
+ It activates after after_idle_sec of idleness,
+ then sends a keepalive ping once every interval_sec,
+ and closes the connection after max_fails failed ping ()
+ """
+ sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
+ sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec)
+ sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec)
+ sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)
+
def _socket(self):
if self.socket is not None:
return self.socket
@@ -121,9 +136,13 @@ class Connection(object):
raw_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
if connection_timeout is not None:
raw_socket.settimeout(connection_timeout)
raw_socket.connect((host, port))
+ self.set_keepalive_linux(sock=raw_socket, after_idle_sec=60, interval_sec=60, max_fails=10)
+
ssl_options = self.options.get('ssl')
if ssl_options is not None and ssl_options is not False:
from ssl import CertificateError, SSLError
I think I am having exactly this issue but I cannot tell. I have a long running query and after ~5 minutes, it stops with a failure: Unhandled error while executing query [Errno 32] Broken pipe Using vertica-python in --> dbt-vertica package.
@ericwoodall It's hard to tell the problem from your description, here are some of directions for debug:
The latest vertica-python already set TCP keepalive as default: Source code, except you cannot config keepalive ping parameters (TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT) like the code above. Therefore TCP keepalive should not be the root of your problem if you are using latest version of vertica-python. In addition, vertica-python provides connection_timeout
as a connection parameter to control socket operations timeout (see README), queries can be affected by this. Please note that dbt-vertica is a third-party library that we don't support. If you want to set connection_timeout
, you may need to change your libraries source code.
On the other hand, a Broken Pipe error could mean a lot of things. It is possible that your Vertica server closes this connection. So I think it's necessary to check the query itself as well.
@sitingren thanks for the reply! I feel confident that it is not the query because I can execute the same query in DBViz and it returns data. I am going to install the latest version of both vertica-python and dbt-vertica and see if this fixes it. Will report back. Thanks again!
@sitingren I installed the latest version of vertica-python (1.0.1) and I am still getting the same broken pipe error. Also, I verified that it is not the query, as I did a table insert [using the same query] and it succeeded.
@ericwoodall Is DBViz also using vertica-python? Have you toggled connection_timeout
setting in dbt-vertica? Since there is no traceback message or log to look at, I think running the query on vertica-python directly might narrow down the problem to be whether on dbt-vertica or vertica-python side.
@sitingren , no DBViz is not using vertica-python but the query returns fine so it is not a server misconfiguration. I am in the process of creating a test python script that uses vertica-python directly. Will report back.
+1 for Connection reset
and [Errno 32] Broken pipe
- same issue for 5+ minutes request to vertica from python code (not Dbt). Really irritating issue. Same code with different SQL works.
I am not expert but connection_timeout
uses different method:
raw_socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
raw_socket.settimeout(connection_timeout)
I have a code that have being broken for 1 year due to this error! Any ideas how set socket.SO_KEEPALIVE?
@C-h-e-r-r-y What vertica-python version are you using? Have you checked out your server configs?
In https://docs.vertica.com/12.0.x/en/sql-reference/config-parameters/general-parameters/, you can see parameters such as KeepAliveIdleTime
, KeepAliveProbeCount
, KeepAliveProbeInterval
.
@sitingren Many thanks for quick reply!!!! I have tried different versions: 0.9.2, 1.2+ and 1.3.0
Vertica sever version is Vertica Analytic Database v9.2.0-0
SHOW CURRENT ALL;
give a lot of values but KeepAliveIdleTime
, KeepAliveProbeCount
, KeepAliveProbeInterval
- are not present. :(
About serverside configs - good idea but I do not know what to try :(
@sitingren Pleae note that connection for same SQL request works from Intellij Idea. E.g. looks like something wrong with python library or settings. Intellij Idea is run from Windows while python from Docker
@C-h-e-r-r-y Please use the latest vertica-python version if possible. Have you set 'connection_timeout' in your connection options?
You can inspect the open socket once you create a Connection object:
import vertica_python
conn_info = {'host': 'xxx', 'user': 'xxx', ..., 'connection_timeout': 10}
with vertica_python.connect(**conn_info) as conn:
print(conn.socket.gettimeout()) # set by 'connection_timeout'
print(conn.socket.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE)) # should be 1
print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE))
print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL))
print(conn.socket.getsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT))
My steps and result of investigations:
- I have tried the
1.3.0
butconnection_timeout
orread_timeout
did not help me. - I have setup odbc and result was the same.
- I have setup jdbc via jaydebeapi and got more or less same behavior with timeout errors.
- Next I run python code (vertica-python version) in Docker with
--sysctl net.ipv4.tcp_keepalive_time=120 --sysctl net.ipv4.tcp_keepalive_intvl=60 --sysctl net.ipv4.tcp_keepalive_probes=6 --sysctl net.ipv4.tcp_fin_timeout=30
and after that I got my code working.
I have not yet investigated this in detail. The only what I can say now is connection_timeout
and read_timeout
are "ignored" and to set up timeouts you have to set them on OS level.
Your post about KeepAliveIdleTime
, KeepAliveProbeCount
, KeepAliveProbeInterval
- gave me right direction but set them from client's side. Again many thanks for this :)