PyHive
                                
                                
                                
                                    PyHive copied to clipboard
                            
                            
                            
                        Cloud config connection timeout?
Hi,
when use hive.connect, could config timeout?
cursor = hive.connect(host='xxx', port=xxx, database=xxx, auth='KERBEROS', kerberos_service_name=xxx).cursor() cursor.execute('SELECT * FROM xxx')
I didn't see the timeout parameter,thanks
`class Connection(object): """Wraps a Thrift session"""
def __init__(self, host=None, port=None, username=None, database='default', auth=None,
             configuration=None, kerberos_service_name=None, password=None,
             thrift_transport=None):
    """Connect to HiveServer2
    :param host: What host HiveServer2 runs on
    :param port: What port HiveServer2 runs on. Defaults to 10000.
    :param auth: The value of hive.server2.authentication used by HiveServer2.
        Defaults to ``NONE``.
    :param configuration: A dictionary of Hive settings (functionally same as the `set` command)
    :param kerberos_service_name: Use with auth='KERBEROS' only
    :param password: Use with auth='LDAP' only
    :param thrift_transport: A ``TTransportBase`` for custom advanced usage.
        Incompatible with host, port, auth, kerberos_service_name, and password.`
Traceback (most recent call last): File "/Users/xxx/Documents/dev/venv/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 104, in open handle.connect(sockaddr) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 60] Operation timed out
same question. any samples of the configuration parameter?
Can anyone detail how to pass timeout to connection? Is it a configuration dictionary element?
+1
+1
+1
Sadly it seems that PyHive doesn't provide this. You'll see that the socket is created here
socket = thrift.transport.TSocket.TSocket(host, port)
One may then call the following TSocket method to set the timeout:
socket.setTimeout(timeout_ms)
In my case, I am using PLAIN authentication, so I just implemented a little function like so:
import sasl
from thrift_sasl import TSaslClientTransport
from thrift.transport.TSocket import TSocket
def create_hive_plain_transport(host, port, username, password, timeout=60):
    socket = TSocket(host, port)
    socket.setTimeout(timeout * 1000)
    sasl_auth = 'PLAIN'
    def sasl_factory():
        sasl_client = sasl.Client()
        sasl_client.setAttr('host', host)
        sasl_client.setAttr('username', username)
        sasl_client.setAttr('password', password)
        sasl_client.init()
        return sasl_client
    return TSaslClientTransport(sasl_factory, sasl_auth, socket)
And now, when running connect, I use this function to create the thrift transport:
hive.connect(
    thrift_transport=create_hive_plain_transport(
        host='bla',
        port=10000,
        username='me',
        password='password',
        timeout=120
    ),
    database='bla'
)
See the following code in PyHive for inspiration (as I did) :smile:
I noticed this approach from the pyhs2 Connection constructor.
Hope this helps someone :smile: Fotis
Any plans to add a timeout param to hive.connect ?
Have you any new insights about this little config?
I have try to change and add the timeout argument and value, but it failed....