PyHive icon indicating copy to clipboard operation
PyHive copied to clipboard

Kerberos (hive/presto) access documentation

Open parisni opened this issue 8 years ago • 12 comments

Hi

Appently ( #47 #91 ) kerberized access is available. However there is no example on how to use it in the documention.

That would be more than helpfull

Thanks

parisni avatar Nov 05 '17 16:11 parisni

Hi:

 where we can access the documention ,we need it

Thanks

ExpressGit avatar May 10 '18 05:05 ExpressGit

Looking at https://github.com/dropbox/PyHive/blob/master/pyhive/hive.py, here is how:

sudo apt-get install libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit
pip install Pyhive sasl thrift_sasl 

Be sure to have a kerberos configured in /etc/krb5.conf. kinit with your keytab.

With pyhive :

from pyhive import hive

engine = hive.Connection(host="<hive-host>", port=<hive-port>, username="<kerberos-username>", database='<db-name>', auth='KERBEROS', kerberos_service_name="hive")

With sqlalchemy :

from sqlalchemy import *

engine = create_engine("hive://<kerberos-username>@<hive-host>:<hive-port>/<db-name>",connect_args={'auth': 'KERBEROS','kerberos_service_name': 'hive'})

Dubrzr avatar Sep 13 '18 13:09 Dubrzr

if keytab must be kinited before connection,that means i have to run pyhive code on hadoop cluster,right?

hellofuturecyj avatar Nov 21 '18 06:11 hellofuturecyj

Nope, you can kinit from a remote computer (through ports 88 tcp+udp) and then do remote pyhive. We do just that.

Dubrzr avatar Nov 21 '18 07:11 Dubrzr

Nope, you can kinit from a remote computer (through ports 88 tcp+udp) and then do remote pyhive. We do just that.

@Dubrzr : Possible to give an example or point to any such link where you've done this?

samarth-goel-guavus avatar Dec 10 '18 05:12 samarth-goel-guavus

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

Dubrzr avatar Dec 10 '18 08:12 Dubrzr

Is there also a solution for Windows? The above example Hive connection seems not to work on my Windows client (kinit works fine for years on my pc). Hive server log sais: java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream

dmueller1607 avatar Nov 29 '19 09:11 dmueller1607

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

How can we implement this in a docker setup. How will a user add hive data source using the superset UI in this case.

abiwill avatar Apr 22 '20 15:04 abiwill

@Dubrzr thanks for providing examples here. it would be nice if you could add this information to the readme.

bkyryliuk avatar Apr 28 '20 16:04 bkyryliuk

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

I did all of this, but it did not work. However, creating a cache file & setting the KRB5CCNAME env variable did the trick for me.

# you have to run this
cmd=f'kinit -kt {keytab_file} -c {ccache_file} {principal}'
...
os.environ['KRB5CCNAME'] = ccache_file

lsgrep avatar Mar 02 '21 11:03 lsgrep

@lsgrep Can you provide the full code snippet. It would be very helpful

ghost avatar Jun 10 '21 05:06 ghost

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

How can we implement this in a docker setup. How will a user add hive data source using the superset UI in this case.

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

i am using all theses steps but getting this error : raise TTransportException(type=TTransportException.NOT_OPEN, thrift.transport.TTransport.TTransportException: Bad status: 3 (b'GSS initiate failed')

sushma1918 avatar Jan 11 '23 15:01 sushma1918