PyHive
PyHive copied to clipboard
Hive connections not working on Windows
Using Anaconda2 with sasl from: http://www.lfd.uci.edu/~gohlke/pythonlibs/ allows the package to install and load, but establishing a connection fails:
In [1]: from pyhive import hive
In [2]: connection = hive.connect(xxxxx)
---------------------------------------------------------------------------
TTransportException Traceback (most recent call last)
<ipython-input-2-6036f792e6bb> in <module>()
----> 1 connection = hive.connect(xxxxx)
C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in connect(*args, **kwargs)
59 :returns: a :py:class:`Connection` object.
60 """
---> 61 return Connection(*args, **kwargs)
62
63
C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in __init__(self, host, port, use
rname, database, configuration)
84
85 try:
---> 86 self._transport.open()
87 open_session_req = ttypes.TOpenSessionReq(
88 client_protocol=ttypes.TProtocolVersion.HIVE_CLI_SERVICE
_PROTOCOL_V1,
C:\Anaconda2\lib\site-packages\thrift_sasl\__init__.pyc in open(self)
70 if not ret:
71 raise TTransportException(type=TTransportException.NOT_OPEN,
---> 72 message=("Could not start SASL: %s" % self.sasl.getError()))
73
74 # Send initial response
TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(
-4): no mechanism available: Unable to find a callback: 2
Which is probably an issue with sasl libraries not being readily available on Windows, but if anyone has managed to get pyhive working on Windows I'd appreciate a pointer
Sorry, I have no idea about windows stuff. (I've only ever tried linux)
On Thu, Nov 19, 2015 at 12:52 PM, Evan McClain [email protected] wrote:
Using Anaconda2 with sasl from: http://www.lfd.uci.edu/~gohlke/pythonlibs/ allows the package to install and load, but establishing a connection fails:
In [1]: from pyhive import hive
In [2]: connection = hive.connect(xxxxx)
TTransportException Traceback (most recent call last)
in () ----> 1 connection = hive.connect(xxxxx) C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in connect(_args, *_kwargs) 59 :returns: a :py:class:
Connection
object. 60 """ ---> 61 return Connection(_args, *_kwargs) 62 63C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in init(self, host, port, use rname, database, configuration) 84 85 try: ---> 86 self._transport.open() 87 open_session_req = ttypes.TOpenSessionReq( 88 client_protocol=ttypes.TProtocolVersion.HIVE_CLI_SERVICE _PROTOCOL_V1,
C:\Anaconda2\lib\site-packages\thrift_sasl__init__.pyc in open(self) 70 if not ret: 71 raise TTransportException(type=TTransportException.NOT_OPEN, ---> 72 message=("Could not start SASL: %s" % self.sasl.getError())) 73 74 # Send initial response
TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL( -4): no mechanism available: Unable to find a callback: 2
Which is probably an issue with sasl libraries not being readily available on Windows, but if anyone has managed to get pyhive working on Windows I'd appreciate a pointer
— Reply to this email directly or view it on GitHub https://github.com/dropbox/PyHive/issues/32.
Yeah, me neither :(
Trying to port some tools to windows since Linux and OS X users are a minority where I work.
Hi aeroaevan, did you find a workaround for this ?
Also interested if anyone has found a workaround for this. I have had no luck after a half-hour of troubleshooting it. Looks like it's specifically the thrift_sasl library with the issue on Windows.
I've got it running with smooth sailing on both CentOS and Ubuntu, but Windows is still no dice.
I'm guessing the issue falls back to https://github.com/toddlipcon/python-sasl/issues/3
On windows, I think we got pyodbc working but our real solution was to just get https://github.com/jupyter/jupyterhub installed on a Linux machine.
same issue, any ideas on what might be going on?
it seems that it works on linux well,thanks
see my PR https://github.com/dropbox/PyHive/pull/122 to install on Windows. for dependencies make sure to use the --no-deps flag with pip. Also requires pure-sasl.
** Note that for Anaconda, the dependency versions installed thru conda might not meet the minimum version required. Use pip
in this case.
Dependencies
# for Presto
requests
# for Hive
thrift>=0.10.0
thrift_sasl>=0.2.1
pure-sasl>=0.3.0
> pip install thrift --no-deps
> pip install thrift_sasl --no-deps
> pip install pure-sasl
To install pyhive, clone repo and change to branch feature-pure-sasl-win
. In repo folder run
> python setup.py sdist
> cd dist
> pip install PyHive-0.3.0.dev0.tar.gz
@devinstevenson
I´m having the same problem as OP. I have tried to follow your solution, uninstalled sasl, installed pure-sasl, cloned the repo and ran: (Since there is no feautue-pure-sasl-win anymore I didnt change to a branch)
> python setup.py sdist
> cd dist
> pip install PyHive-0.5.0.dev0.tar.gz
and pyhive still tries to import sasl. How should I solve this?
@aeroevan @pyite1 got any work arrounds?
do you can connect to hivesever2 with pyhive lib on windows os? @aeroevan @SaucePan1
Any further insights on getting pyhive to work for Windows?
Same issue here, anyone have a solution?
In [2]: conn = hive.Connection(host='10.116.31.132', port=8080, username='hadoop', database='audit')
TTransportException Traceback (most recent call last)
in () ----> 1 conn = hive.Connection(host='10.116.31.132', port=8080, username='hadoop', database='audit') c:\users\user.virtualenvs\nifi-_zbo39c7\lib\site-packages\pyhive\hive.py in init(self, host, port, username, database, auth, configuration, kerberos_service_name, password, thrift_transport) 160 161 try: --> 162 self._transport.open() 163 open_session_req = ttypes.TOpenSessionReq( 164 client_protocol=protocol_version,
c:\users\user.virtualenvs\nifi-zbo39c7\lib\site-packages\thrift_sasl_init.py in open(self) 77 if not ret: 78 raise TTransportException(type=TTransportException.NOT_OPEN, ---> 79 message=("Could not start SASL: %s" % self.sasl.getError())) 80 81 # Send initial response
TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
Same issue here....
there is no feature-pure-sasl-win...
@devinstevenson I am unable to find the feature or the branch you mentioned for windows. Any pointers?
Following: pip install thrift --no-deps pip install thrift_sasl --no-deps pip install pure-sasl python setup.py install
Having same issue with latest version using cloned repo. Still won't use pure-sasl on Windows: File "E:\Anaconda3\lib\site-packages\pyhive-0.6.0.dev0-py3.6.egg\pyhive\hive.py", line 152, in init ModuleNotFoundError: No module named 'sasl'
I tried to change hive.py to use pure-sasl but the interfaces are much different than sasl.
Ended up downloading Hortonworks Hive ODBC driver then used pyodbc, worked fine with EMR
Same issue!!! It ruined my day. If pyhive can't fix it soon, what is alternative HIVE's connector? Please tell me, thanks all of you.
@Yensan I had to resort to the same thing as @dletendre by downloading the ODBC driver and using pyodbc
.
@cdeterman thank you. Maybe I will try it sometime. Because I find impala + puresasl works well in Win7 + python3.6 amd64
Hi, @aeroevan @pyite1 pyhive can also use pure-sasl for alternative dependence, just like impala.
PS: If you can't install puresasl in Windows,download from https://pypi.org/project/pure-sasl/#files
then pip install pure_sasl-0.5.1-py2.py3-none-any.whl
I'm experiencing the same issue -- I'm not able to get pyhive to work on a Windows machine using either sasl==0.2.1 or pure-sasl==0.5.1. Has any attempt been made to actually resolve this issue that has persisted for many years?
just had the same problem as well. Am going to work on alternates, but here is an article that points to two other approaches: (http://dwgeek.com/guide-connecting-hiveserver2-using-python-pyhive.html/) u using Beeline JDBC or, alternately, the Jaydebeapi package.
For all those Windows users, I have made a thrift transport library that uses pure-sasl. Please check it out. https://github.com/devinstevenson/pure-transport. There are example files in the repo.
I use https://blog.csdn.net/wenjun_xiao/article/details/104458940 to solve this problem
I use https://blog.csdn.net/wenjun_xiao/article/details/104458940 to solve this problem
In addition to above link, I refer to https://www.codeleading.com/article/94522886036/ and re-installed a miniconda 4.6.14 which released in March 2019
And it worked
my environment: Windows Server 2019 x64
It did not work on my previous Miniconda 4.7 install, at least not without configuration below
if install sasl with conda, then add registry, string value HKEY_LOCAL_MACHINE\SOFTWARE\Carnegie Mellon\Project Cyrus\SASL Library C:\Users\cdarling\Miniconda3\envs\hive\Library\bin\sasl2
another option is to move sasl2 folder to C:\CMU\bin\
see https://github.com/cyrusimap/cyrus-sasl/blob/master/lib/common.c#L2472 for reference
For all those Windows users, I have made a thrift transport library that uses pure-sasl. Please check it out. https://github.com/devinstevenson/pure-transport. There are example files in the repo.
Thanks, I will try this.
This thread is almost two years old, is there any way to use pyhive with sqlalchemy in windows now? pyodbc
is supported for mssql, but there are no dialect for hive.pyodbc
. Any updates?
I'm using PyHive from Windows with SqlAlchemy and without ssl.
I'm using PyHive from Windows with SqlAlchemy and without ssl.
Can you please show me how?
I'm using PyHive from Windows with SqlAlchemy and without ssl.
Can you please show me how?
Sure.
- Put correct values to user & password & host below.
- You HIVE server should be configure with NOSASL.
- HIVE server should work with MAP-REDUCE (you can change it, but I didn't test this use-case)
- The code below was tested on Python 3.8 with following dependencies:
future==0.18.2 PyHive==0.6.2 python-dateutil==2.8.1 six==1.15.0 SQLAlchemy==1.3.3 thrift==0.13.0
import logging
import time
#fix for "AttributeError: module 'time' has no attribute 'clock'"
time.clock=time.time
from time import sleep
from atexit import register as atexit_register
from sqlalchemy import create_engine
from sqlalchemy.sql import select
from sqlalchemy.schema import MetaData
import pandas as pd
# from alexber.utils.mains import fixabscwd
logger = None
def run():
user = ''
password = ''
host = ''
port = 10000
dbname = 'default'
engine = create_engine(
# 'hive://localhost:10000/default?auth=NOSASL',
f'hive://{user}:{password}@{host}:{port}/{dbname}?auth=NOSASL',
connect_args={'configuration': {'hive.execution.engine': 'mr',
'hive.mapred.mode': 'nonstrict',
'hive.auto.convert.join': 'false',
# 'hive.server2.authentication': 'NOSASL'
},
# 'thrift_transport': transport
})
atexit_register(engine.dispose)
# df = pd.read_sql('select * from db_name.table_name', con=engine, coerce_float=False)
# df2 = pd.read_sql_table('table_name', schema='db_name', con=engine, coerce_float=False)
meta = MetaData(engine, schema="poc_test")
meta.reflect(only=["colors"]) # include_columns="[id,color]"
colors = meta.tables.get('poc_test.colors')
# not practical, for demonstration purposes only
ins = colors.insert().values(color=2)
print(f"Insert statement {ins}")
print(f"Insert params {ins.compile().params}")
engine.execute(ins)
sleep(60)
slct = colors.select()
result = engine.execute(slct)
print("Select all result")
for row in result:
print(row)
s = select([colors.c.color])
result = engine.execute(s)
print("Select color result")
for row in result:
print(row)
def main(args=None):
"""
main method
:param args: if not None, suppresses sys.args
"""
logging.basicConfig(format='%(asctime)-15s %(levelname)s [%(name)s.%(funcName)s] %(message)s',
level=logging.INFO)
logging.getLogger('sqlalchemy').setLevel(logging.DEBUG)
logging.captureWarnings(True)
# fixabscwd()
global logger
logger = logging.getLogger(__name__)
run()
if __name__ == "__main__":
main()