PyHive icon indicating copy to clipboard operation
PyHive copied to clipboard

Hive connections not working on Windows

Open aeroevan opened this issue 8 years ago • 30 comments

Using Anaconda2 with sasl from: http://www.lfd.uci.edu/~gohlke/pythonlibs/ allows the package to install and load, but establishing a connection fails:

In [1]: from pyhive import hive

In [2]: connection = hive.connect(xxxxx)
---------------------------------------------------------------------------
TTransportException                       Traceback (most recent call last)
<ipython-input-2-6036f792e6bb> in <module>()
----> 1 connection = hive.connect(xxxxx)

C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in connect(*args, **kwargs)
     59     :returns: a :py:class:`Connection` object.
     60     """
---> 61     return Connection(*args, **kwargs)
     62
     63

C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in __init__(self, host, port, use
rname, database, configuration)
     84
     85         try:
---> 86             self._transport.open()
     87             open_session_req = ttypes.TOpenSessionReq(
     88                 client_protocol=ttypes.TProtocolVersion.HIVE_CLI_SERVICE
_PROTOCOL_V1,

C:\Anaconda2\lib\site-packages\thrift_sasl\__init__.pyc in open(self)
     70     if not ret:
     71       raise TTransportException(type=TTransportException.NOT_OPEN,
---> 72         message=("Could not start SASL: %s" % self.sasl.getError()))
     73
     74     # Send initial response

TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(
-4): no mechanism available: Unable to find a callback: 2

Which is probably an issue with sasl libraries not being readily available on Windows, but if anyone has managed to get pyhive working on Windows I'd appreciate a pointer

aeroevan avatar Nov 19 '15 20:11 aeroevan

Sorry, I have no idea about windows stuff. (I've only ever tried linux)

On Thu, Nov 19, 2015 at 12:52 PM, Evan McClain [email protected] wrote:

Using Anaconda2 with sasl from: http://www.lfd.uci.edu/~gohlke/pythonlibs/ allows the package to install and load, but establishing a connection fails:

In [1]: from pyhive import hive

In [2]: connection = hive.connect(xxxxx)

TTransportException Traceback (most recent call last) in () ----> 1 connection = hive.connect(xxxxx)

C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in connect(_args, *_kwargs) 59 :returns: a :py:class:Connection object. 60 """ ---> 61 return Connection(_args, *_kwargs) 62 63

C:\Anaconda2\lib\site-packages\pyhive\hive.pyc in init(self, host, port, use rname, database, configuration) 84 85 try: ---> 86 self._transport.open() 87 open_session_req = ttypes.TOpenSessionReq( 88 client_protocol=ttypes.TProtocolVersion.HIVE_CLI_SERVICE _PROTOCOL_V1,

C:\Anaconda2\lib\site-packages\thrift_sasl__init__.pyc in open(self) 70 if not ret: 71 raise TTransportException(type=TTransportException.NOT_OPEN, ---> 72 message=("Could not start SASL: %s" % self.sasl.getError())) 73 74 # Send initial response

TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL( -4): no mechanism available: Unable to find a callback: 2

Which is probably an issue with sasl libraries not being readily available on Windows, but if anyone has managed to get pyhive working on Windows I'd appreciate a pointer

— Reply to this email directly or view it on GitHub https://github.com/dropbox/PyHive/issues/32.

jingw avatar Nov 19 '15 20:11 jingw

Yeah, me neither :(

Trying to port some tools to windows since Linux and OS X users are a minority where I work.

aeroevan avatar Nov 19 '15 21:11 aeroevan

Hi aeroaevan, did you find a workaround for this ?

aschmu avatar Feb 19 '16 11:02 aschmu

Also interested if anyone has found a workaround for this. I have had no luck after a half-hour of troubleshooting it. Looks like it's specifically the thrift_sasl library with the issue on Windows.

I've got it running with smooth sailing on both CentOS and Ubuntu, but Windows is still no dice.

GISDev01 avatar Mar 07 '16 02:03 GISDev01

I'm guessing the issue falls back to https://github.com/toddlipcon/python-sasl/issues/3

On windows, I think we got pyodbc working but our real solution was to just get https://github.com/jupyter/jupyterhub installed on a Linux machine.

aeroevan avatar Mar 09 '16 01:03 aeroevan

same issue, any ideas on what might be going on?

pyite1 avatar May 05 '16 19:05 pyite1

it seems that it works on linux well,thanks

firecatzkj avatar Jun 08 '17 02:06 firecatzkj

see my PR https://github.com/dropbox/PyHive/pull/122 to install on Windows. for dependencies make sure to use the --no-deps flag with pip. Also requires pure-sasl.

** Note that for Anaconda, the dependency versions installed thru conda might not meet the minimum version required. Use pip in this case.

Dependencies

# for Presto
requests

# for Hive
thrift>=0.10.0
thrift_sasl>=0.2.1
pure-sasl>=0.3.0
> pip install thrift --no-deps
> pip install thrift_sasl --no-deps
> pip install pure-sasl

To install pyhive, clone repo and change to branch feature-pure-sasl-win. In repo folder run

> python setup.py sdist
> cd dist
> pip install PyHive-0.3.0.dev0.tar.gz

devinstevenson avatar Jun 20 '17 20:06 devinstevenson

@devinstevenson

I´m having the same problem as OP. I have tried to follow your solution, uninstalled sasl, installed pure-sasl, cloned the repo and ran: (Since there is no feautue-pure-sasl-win anymore I didnt change to a branch)

> python setup.py sdist
> cd dist
> pip install PyHive-0.5.0.dev0.tar.gz

and pyhive still tries to import sasl. How should I solve this?

@aeroevan @pyite1 got any work arrounds?

SaucePan1 avatar Sep 22 '17 08:09 SaucePan1

do you can connect to hivesever2 with pyhive lib on windows os? @aeroevan @SaucePan1

duniang818 avatar Oct 11 '17 07:10 duniang818

Any further insights on getting pyhive to work for Windows?

cdeterman avatar Dec 08 '17 19:12 cdeterman

Same issue here, anyone have a solution?

In [2]: conn = hive.Connection(host='10.116.31.132', port=8080, username='hadoop', database='audit')

TTransportException Traceback (most recent call last) in () ----> 1 conn = hive.Connection(host='10.116.31.132', port=8080, username='hadoop', database='audit')

c:\users\user.virtualenvs\nifi-_zbo39c7\lib\site-packages\pyhive\hive.py in init(self, host, port, username, database, auth, configuration, kerberos_service_name, password, thrift_transport) 160 161 try: --> 162 self._transport.open() 163 open_session_req = ttypes.TOpenSessionReq( 164 client_protocol=protocol_version,

c:\users\user.virtualenvs\nifi-zbo39c7\lib\site-packages\thrift_sasl_init.py in open(self) 77 if not ret: 78 raise TTransportException(type=TTransportException.NOT_OPEN, ---> 79 message=("Could not start SASL: %s" % self.sasl.getError())) 80 81 # Send initial response

TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

lyle-w avatar Mar 09 '18 06:03 lyle-w

Same issue here....

charlesmilk avatar Mar 13 '18 12:03 charlesmilk

there is no feature-pure-sasl-win...

383747787 avatar Mar 18 '18 07:03 383747787

@devinstevenson I am unable to find the feature or the branch you mentioned for windows. Any pointers?

anilkulkarni87 avatar Mar 22 '18 00:03 anilkulkarni87

Following: pip install thrift --no-deps pip install thrift_sasl --no-deps pip install pure-sasl python setup.py install

Having same issue with latest version using cloned repo. Still won't use pure-sasl on Windows: File "E:\Anaconda3\lib\site-packages\pyhive-0.6.0.dev0-py3.6.egg\pyhive\hive.py", line 152, in init ModuleNotFoundError: No module named 'sasl'

dletendre avatar Jul 18 '18 19:07 dletendre

I tried to change hive.py to use pure-sasl but the interfaces are much different than sasl.

Ended up downloading Hortonworks Hive ODBC driver then used pyodbc, worked fine with EMR

dletendre avatar Jul 18 '18 20:07 dletendre

Same issue!!! It ruined my day. If pyhive can't fix it soon, what is alternative HIVE's connector? Please tell me, thanks all of you.

Yensan avatar Aug 29 '18 08:08 Yensan

@Yensan I had to resort to the same thing as @dletendre by downloading the ODBC driver and using pyodbc.

cdeterman avatar Aug 29 '18 19:08 cdeterman

@cdeterman thank you. Maybe I will try it sometime. Because I find impala + puresasl works well in Win7 + python3.6 amd64 Hi, @aeroevan @pyite1 pyhive can also use pure-sasl for alternative dependence, just like impala. PS: If you can't install puresasl in Windows,download from https://pypi.org/project/pure-sasl/#files then pip install pure_sasl-0.5.1-py2.py3-none-any.whl

Yensan avatar Aug 30 '18 03:08 Yensan

I'm experiencing the same issue -- I'm not able to get pyhive to work on a Windows machine using either sasl==0.2.1 or pure-sasl==0.5.1. Has any attempt been made to actually resolve this issue that has persisted for many years?

ntanners avatar Nov 14 '18 15:11 ntanners

just had the same problem as well. Am going to work on alternates, but here is an article that points to two other approaches: (http://dwgeek.com/guide-connecting-hiveserver2-using-python-pyhive.html/) u using Beeline JDBC or, alternately, the Jaydebeapi package.

onlinematters avatar Mar 03 '19 16:03 onlinematters

For all those Windows users, I have made a thrift transport library that uses pure-sasl. Please check it out. https://github.com/devinstevenson/pure-transport. There are example files in the repo.

devinstevenson avatar Mar 10 '19 06:03 devinstevenson

I use https://blog.csdn.net/wenjun_xiao/article/details/104458940 to solve this problem

wenjunxiao avatar Feb 25 '20 12:02 wenjunxiao

I use https://blog.csdn.net/wenjun_xiao/article/details/104458940 to solve this problem

In addition to above link, I refer to https://www.codeleading.com/article/94522886036/ and re-installed a miniconda 4.6.14 which released in March 2019

And it worked

my environment: Windows Server 2019 x64

It did not work on my previous Miniconda 4.7 install, at least not without configuration below

if install sasl with conda, then add registry, string value HKEY_LOCAL_MACHINE\SOFTWARE\Carnegie Mellon\Project Cyrus\SASL Library C:\Users\cdarling\Miniconda3\envs\hive\Library\bin\sasl2

another option is to move sasl2 folder to C:\CMU\bin\

see https://github.com/cyrusimap/cyrus-sasl/blob/master/lib/common.c#L2472 for reference

cdarlint avatar May 11 '20 04:05 cdarlint

For all those Windows users, I have made a thrift transport library that uses pure-sasl. Please check it out. https://github.com/devinstevenson/pure-transport. There are example files in the repo.

Thanks, I will try this.

alex-ber avatar Jan 20 '21 00:01 alex-ber

This thread is almost two years old, is there any way to use pyhive with sqlalchemy in windows now? pyodbc is supported for mssql, but there are no dialect for hive.pyodbc. Any updates?

pritam-dey3 avatar Dec 31 '21 10:12 pritam-dey3

I'm using PyHive from Windows with SqlAlchemy and without ssl.

alex-ber avatar Jan 01 '22 17:01 alex-ber

I'm using PyHive from Windows with SqlAlchemy and without ssl.

Can you please show me how?

pritam-dey3 avatar Jan 03 '22 05:01 pritam-dey3

I'm using PyHive from Windows with SqlAlchemy and without ssl.

Can you please show me how?

Sure.

  1. Put correct values to user & password & host below.
  2. You HIVE server should be configure with NOSASL.
  3. HIVE server should work with MAP-REDUCE (you can change it, but I didn't test this use-case)
  4. The code below was tested on Python 3.8 with following dependencies:

future==0.18.2 PyHive==0.6.2 python-dateutil==2.8.1 six==1.15.0 SQLAlchemy==1.3.3 thrift==0.13.0

import logging
import time
#fix for "AttributeError: module 'time' has no attribute 'clock'"
time.clock=time.time
from time import sleep
from atexit import register as atexit_register
from sqlalchemy import create_engine
from sqlalchemy.sql import select
from sqlalchemy.schema import MetaData

import pandas as pd
# from alexber.utils.mains import fixabscwd

logger = None


def run():
    user = ''
    password = ''
    host = ''
    port = 10000
    dbname = 'default'

    engine = create_engine(
        # 'hive://localhost:10000/default?auth=NOSASL',
        f'hive://{user}:{password}@{host}:{port}/{dbname}?auth=NOSASL',
        connect_args={'configuration': {'hive.execution.engine': 'mr',
                                        'hive.mapred.mode': 'nonstrict',
                                        'hive.auto.convert.join': 'false',
                                        # 'hive.server2.authentication': 'NOSASL'
                                        },
                      # 'thrift_transport': transport
                      })
    atexit_register(engine.dispose)

    # df = pd.read_sql('select * from db_name.table_name', con=engine, coerce_float=False)

    # df2 = pd.read_sql_table('table_name', schema='db_name', con=engine, coerce_float=False)

    meta = MetaData(engine, schema="poc_test")

    meta.reflect(only=["colors"])  # include_columns="[id,color]"
    colors = meta.tables.get('poc_test.colors')

    # not practical, for demonstration purposes only
    ins = colors.insert().values(color=2)
    print(f"Insert statement {ins}")
    print(f"Insert params {ins.compile().params}")

    engine.execute(ins)
    sleep(60)

    slct = colors.select()
    result = engine.execute(slct)
    print("Select all result")
    for row in result:
        print(row)

    s = select([colors.c.color])
    result = engine.execute(s)
    print("Select color result")
    for row in result:
        print(row)


def main(args=None):
    """
    main method
    :param args: if not None, suppresses sys.args
    """
    logging.basicConfig(format='%(asctime)-15s %(levelname)s [%(name)s.%(funcName)s] %(message)s',
                        level=logging.INFO)
    logging.getLogger('sqlalchemy').setLevel(logging.DEBUG)
    logging.captureWarnings(True)

    # fixabscwd()

    global logger
    logger = logging.getLogger(__name__)

    run()


if __name__ == "__main__":
    main()



alex-ber avatar Jan 03 '22 20:01 alex-ber