snakebite
snakebite copied to clipboard
Kerberos - javax.security.sasl.SaslException. Client mechanism is malformed
I am using snakebite with kerberos.
When I do snakebite -D ls /
, I notice this error:
status: FATAL
serverIpcVersionNum: 9
exceptionClassName: "javax.security.sasl.SaslException"
errorMsg: "Client mechanism is malformed"
errorDetail: FATAL_UNAUTHORIZED
clientId: ""
retryCount: -1
I have cyrus-sasl pkg installed. Prior to this failure, snakebite encountered 2 mechanisms - DIGEST-MD5 and GSSAPI. But I am not sure if it chose the right one because the log for chosen mech is empty
DEBUG:snakebite.channel:Response:
state: NEGOTIATE
auths {
method: "TOKEN"
mechanism: "DIGEST-MD5"
protocol: ""
serverId: "default"
challenge: "some challenge callback"
}
auths {
method: "KERBEROS"
mechanism: "GSSAPI"
protocol: "hdfs"
serverId: "some server"
}
DEBUG:snakebite.rpc_sasl:Available mechs: DIGEST-MD5,GSSAPI
DEBUG:snakebite.rpc_sasl:Chosen mech:
DEBUG:snakebite.channel:Sending: 00 00 00 10 (len: 4)
Not sure if I am missing a package or there is a bug in the negotiation or something else
cc: @bolkedebruin
So you have a valid ticket? Show the output of klist
Sent from my iPhone
On 8 okt. 2015, at 21:36, praveev [email protected] wrote:
I am using snakebite with kerberos.
When I do snakebite -D ls /, I notice this error:
status: FATAL serverIpcVersionNum: 9 exceptionClassName: "javax.security.sasl.SaslException" errorMsg: "Client mechanism is malformed" errorDetail: FATAL_UNAUTHORIZED clientId: "" retryCount: -1 I have cyrus-sasl pkg installed. Prior to this failure, snakebite encountered 2 mechanisms - DIGEST-MD5 and GSSAPI. But I am not sure if it chose the right one because the log for chosen mech is empty
DEBUG:snakebite.channel:Response: state: NEGOTIATE auths { method: "TOKEN" mechanism: "DIGEST-MD5" protocol: "" serverId: "default" challenge: "some challenge callback" } auths { method: "KERBEROS" mechanism: "GSSAPI" protocol: "hdfs" serverId: "some server" }
DEBUG:snakebite.rpc_sasl:Available mechs: DIGEST-MD5,GSSAPI DEBUG:snakebite.rpc_sasl:Chosen mech: DEBUG:snakebite.channel:Sending: 00 00 00 10 (len: 4) Not sure if I am missing a package or there is a bug in the negotiation or something else
cc: @bolkedebruin
— Reply to this email directly or view it on GitHub.
Yes. klist output
Ticket cache: FILE:/tmp/krb789_0
Default principal: [email protected]
Valid starting Expires Service principal
10/08/15 20:11:01 10/09/15 20:11:01 krbtgt/[email protected]
renew until 10/15/15 20:11:01
sasl chooses the mech that it can support. I am only getting this output when the ticket has expired.
expired ticket, but available ticket cache:
snakebite -D ls /
DEBUG:snakebite.rpc_sasl:Available mechs: DIGEST-MD5,GSSAPI
DEBUG:snakebite.rpc_sasl:Chosen mech:
Running kdestroy to remove ticket cache
destroy
snakebite ls /
krbV.Krb5Error: (-1765328189, 'No credentials cache found')
Getting a valid ticket:
kinit bolke
Password for [email protected]
snakebite ls /
Found 7 items
drwxrwxrwx - yarn hadoop 0 2015-08-21 10:08 /app-logs
drwxr-xr-x - hdfs hadoop 0 2015-08-21 10:07 /apps
drwxr-xr-x - hdfs hadoop 0 2015-08-21 10:06 /hdp
drwxr-xr-x - mapred hdfs 0 2015-08-21 10:06 /mapred
drwxrwxrwx - mapred hadoop 0 2015-08-21 10:06 /mr-history
drwxrwxrwx - hdfs hdfs 0 2015-08-21 10:10 /tmp
drwxr-xr-x - hdfs hdfs 0 2015-08-21 10:07 /user
so try kdestroy & kinit
hmm....yea tried kdestroy and kinit.
kdestroy does what it was supposed to.
kinit with keytab worked. But snakebite ls /
err'd out again with Client mechanism is malformed
This shouldn't matter but does it make a difference if I use a keytab to kinit instead of a straight up password?
Can you check if this still works for you if you use a keytab file instead? @bolkedebruin
For whatever reason, I have duplicate keys in the keytab file (with the same principal and kvno). Saw this by doing klist -k -t user.keytab
Wonder if that could be an issue.
Might be. In my environment it works fine with a key tab (just tested it). Do other utilities work? ie. hadoop command line?
hmm.....ok. yea. my hadoop command line works fine.
Ok. To be honest I am a bit lost, because I cannot reproduce it except with a expired ticket in a cache. Furthermore, not selecting a mech is dependent on the underlying sasl (system) libraries. What you could try is shoot a bit broader and create a new user principal and try using that to connect and/or generate a new key tab.
If you you share your outputs then please include everything (ie full command lines and debug output).
Here is the complete output. I don't own the hadoop or kerberos systems I am working with. I am trying to setup a small dev env. to debug better.
[praveev@flame03 ~]$ /usr/bin/kinit -V -k -t /home/praveev/praveev.prod.headless.keytab praveev
Using default cache: /tmp/krb5cc_90677
Using principal: [email protected]
Using keytab: /home/praveev/praveev.keytab
Authenticated to Kerberos v5
[praveev@flame03 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_90677
Default principal: [email protected]
Valid starting Expires Service principal
10/08/15 15:36:52 10/09/15 15:36:52 krbtgt/[email protected]
renew until 10/15/15 15:36:52
[praveev@flame03 ~]$ snakebite -D ls /
DEBUG:snakebite.config:Got namenode 'hdfs://master-nn:8020/' from /home/praveev/hadoop/conf/core-site.xml
DEBUG:snakebite.config:Got hadoop.security.authentication 'kerberos'
DEBUG:snakebite.config:hdfs principal found: '[email protected]'
DEBUG:snakebite.client:Switch to namenode: master-nn:8020
DEBUG:snakebite.client:Created client for master-nn:8020 with trash=False and sasl=True
DEBUG:snakebite.client:Trying to find path /
DEBUG:snakebite.channel:############## CONNECTING ##############
DEBUG:snakebite.channel:Sending: 68 72 70 63 (len: 4)
DEBUG:snakebite.channel:Sending: 09 (len: 1)
DEBUG:snakebite.channel:Sending: 00 (len: 1)
DEBUG:snakebite.channel:Sending: df (len: 1)
DEBUG:snakebite.channel:Sending: 00 00 00 0e (len: 4)
DEBUG:snakebite.channel:Sending: 0a (len: 1)
DEBUG:snakebite.channel:Sending: 08 02 10 00 18 41 22 00 28 01 (len: 10)
DEBUG:snakebite.channel:Sending: 02 (len: 1)
DEBUG:snakebite.channel:Sending: 10 01 (len: 2)
DEBUG:snakebite.rpc_sasl:Send out:
state: NEGOTIATE
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.RpcHeader_pb2.RpcSaslProto'>
DEBUG:snakebite.channel:Bytes read: 4, total: 4
DEBUG:snakebite.channel:Total response length: 226
DEBUG:snakebite.channel:Bytes read: 4, total: 8
DEBUG:snakebite.channel:Delimited message length (pos 1): 14
DEBUG:snakebite.channel:Rewinding pos 7 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 4
DEBUG:snakebite.channel:Bytes read: 11, total: 19
DEBUG:snakebite.channel:Delimited message bytes (14): 08 df ff ff ff 0f 10 00 18 09 3a 00 40 01 (len: 14)
DEBUG:snakebite.channel:Header read 15
DEBUG:snakebite.channel:RpcResponseHeaderProto:
callId: 4294967263
status: SUCCESS
serverIpcVersionNum: 9
clientId: ""
retryCount: -1
DEBUG:snakebite.channel:header: 15, total: 226
DEBUG:snakebite.channel:Bytes read: 211, total: 230
DEBUG:snakebite.channel:Delimited message length (pos 2): 209
DEBUG:snakebite.channel:Rewinding pos 229 with 209 places
DEBUG:snakebite.channel:Reset buffer to pos 20
DEBUG:snakebite.channel:Delimited message bytes (209): some bytes (len: 209)
DEBUG:snakebite.channel:Response:
state: NEGOTIATE
auths {
method: "TOKEN"
mechanism: "DIGEST-MD5"
protocol: ""
serverId: "default"
challenge: "some-challenge"
}
auths {
method: "KERBEROS"
mechanism: "GSSAPI"
protocol: "hdfs"
serverId: "master-nn"
}
DEBUG:snakebite.rpc_sasl:Available mechs: DIGEST-MD5,GSSAPI
DEBUG:snakebite.rpc_sasl:Chosen mech:
DEBUG:snakebite.channel:Sending: 00 00 00 10 (len: 4)
DEBUG:snakebite.channel:Sending: 0a (len: 1)
DEBUG:snakebite.channel:Sending: 08 02 10 00 18 41 22 00 28 01 (len: 10)
DEBUG:snakebite.channel:Sending: 04 (len: 1)
DEBUG:snakebite.channel:Sending: 10 02 1a 00 (len: 4)
DEBUG:snakebite.rpc_sasl:Send out:
state: INITIATE
token: ""
DEBUG:snakebite.channel:############## RECVING ##############
DEBUG:snakebite.channel:############## PARSING ##############
DEBUG:snakebite.channel:Payload class: <class 'snakebite.protobuf.RpcHeader_pb2.RpcSaslProto'>
DEBUG:snakebite.channel:Bytes read: 4, total: 4
DEBUG:snakebite.channel:Total response length: 83
DEBUG:snakebite.channel:Bytes read: 4, total: 8
DEBUG:snakebite.channel:Delimited message length (pos 1): 82
DEBUG:snakebite.channel:Rewinding pos 7 with 3 places
DEBUG:snakebite.channel:Reset buffer to pos 4
DEBUG:snakebite.channel:Bytes read: 79, total: 87
DEBUG:snakebite.channel:Delimited message bytes (82): 08 df ff ff ff 0f 10 02 18 09 22 21 6a 61 76 61 78 2e 73 65 63 75 72 69 74 79 2e 73 61 73 6c 2e 53 61 73 6c 45 78 63 65 70 74 69 6f 6e 2a 1d 43 6c 69 65 6e 74 20 6d 65 63 68 61 6e 69 73 6d 20 69 73 20 6d 61 6c 66 6f 72 6d 65 64 30 0f 3a 00 40 01 (len: 82)
DEBUG:snakebite.channel:Header read 83
DEBUG:snakebite.channel:RpcResponseHeaderProto:
callId: 4294967263
status: FATAL
serverIpcVersionNum: 9
exceptionClassName: "javax.security.sasl.SaslException"
errorMsg: "Client mechanism is malformed"
errorDetail: FATAL_UNAUTHORIZED
clientId: ""
retryCount: -1
DEBUG:snakebite.client:Request failed with javax.security.sasl.SaslException
Client mechanism is malformed
Request error: javax.security.sasl.SaslException
Client mechanism is malformed
@bolkedebruin making progress! I got it working with kerberos on the command line. But it does not work from the code from airflow. I suspect it has something to do with the principal name in airflow.cfg.
Should the principal be in this format: hdfs/[email protected] where _host will get resolved into the hostname?
Excellent! Can you share what did you do to get it to work to also help others?
For airflow: it should be the principal that you can access your environment with, ie your own. Besides that the principal gets augmented with the host (which it maybe should not do now I am thinking about it).
Op 9 okt. 2015, om 21:49 heeft praveev [email protected] het volgende geschreven:
@bolkedebruin https://github.com/bolkedebruin making progress! I got it working with kerberos on the command line. But it does not work from the code from airflow. I suspect it has something to do with the principal name in airflow.cfg.
Should the principal be in this format: hdfs/[email protected] where _host will get resolved into the hostname?
— Reply to this email directly or view it on GitHub https://github.com/spotify/snakebite/issues/175#issuecomment-146971863.
In hdfs-site.xml, I incorrectly had dfs.namenode.kerberos.principal to be praveev - my default principal instead of hdfs/[email protected]. I think it has to start with hdfs /
. Here is a copy of hdfs-site and core-site that worked for me.
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/[email protected]</value>
<description>
Kerberos principal name for the NameNode
</description>
</property>
</configuration>
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="/home/gs/conf/local/local-superuser-conf.xml" />
<!-- file system properties -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://{hadoop-namenode}:8020</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for HDFS.</description>
<final>false</final>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
<description>
Set the authentication for the cluster. Valid values are: simple or
kerberos.
</description>
</property>
</configuration>
@bolkedebruin - for the airflow part, maybe we should take this conversation to airflow repo. here's my kerberos related parts of airflow.cfg and this doesn't work:
[core] security = kerberos
[kerberos] ccache = /tmp/krb5cc_90677 principal = praveev reinit_frequency = 3600 kinit_path = kinit keytab = /home/praveev/praveev.keytab
For the command line, the principal is of the form hdfs/[email protected] where _HOST is my hostname. It doesn't use the default principal in the ticket.
For airflow, in airflow/security/kerberos.py
principal is defined as
principal = "%s/%s" % (conf.get('kerberos', 'principal'), socket.getfqdn())
which is not quite the same as the ones from command line b/c i don't see it include the @ part of it ( i.e @LOCAL.COM)
My klist Ticket cache: FILE:/tmp/krb5cc_90677 Default principal: [email protected]
Valid starting Expires Service principal 10/09/15 12:18:57 10/10/15 12:18:57 krbtgt/[email protected] renew until 10/16/15 12:18:57 10/09/15 12:19:25 10/10/15 12:18:57 hdfs/[email protected] renew until 10/16/15 12:18:57
I got the service principal, hdfs/[email protected], after executing from the command line.
@bolkedebruin I am unable to replicate the command line behavior with the library.
Script-1 from snakebite.client import Client client = Client("hadoop-namenode", 8020, use_trash=True, use_sasl=True) for x in client.ls(['/']): ....print x
Script-2 from snakebite.client import Client client = Client("hadoop-namenode", 8020, use_trash=True, use_sasl=True, effective_user="hdfs/[email protected]") for x in client.ls(['/']): ....print x
hadoop-namenode is the one that is there in core-site.xml effective_user is the principal value from hdfs-site.xml
I think you need to read up a little how Kerberos works and how it works in a Hadoop environment.
For now I would leave out the effective user specifically because your are configuring it as the hdfs administrative user.
For the principal use the principal that you are using to do a kinit with.
And yes this thread should be on the airflow mailing list.
On 10 okt. 2015, at 01:08, praveev [email protected] wrote:
@bolkedebruin I am unable to replicate the command line behavior with the library.
Script-1 from snakebite.client import Client client = Client("hadoop-namenode", 8020, use_trash=True, use_sasl=True) for x in client.ls(['/']): ....print x
Script-2 from snakebite.client import Client client = Client("hadoop-namenode", 8020, use_trash=True, use_sasl=True, effective_user="hdfs/[email protected]") for x in client.ls(['/']): ....print x
hadoop-namenode is the one that is there in core-site.xml effective_user is the principal value from hdfs-site.xml
— Reply to this email directly or view it on GitHub.
I'm having the same issue, though nothing to do with Airflow.
The CLI works fine
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)
root@ip-10-0-0-165:~# snakebite -v
2.7.2
root@ip-10-0-0-165:~# klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0)
root@ip-10-0-0-165:~# snakebite ls /
FAILS as expected
root@ip-10-0-0-165:~# kinit edh
Password for edh@HADOOP:
root@ip-10-0-0-165:~# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: edh@HADOOP
Valid starting Expires Service principal
10/22/2015 22:20:11 10/23/2015 22:20:11 krbtgt/HADOOP@HADOOP
renew until 10/29/2015 22:20:11
root@ip-10-0-0-165:~# snakebite ls /
Found 3 items
drwxrwxr-x - solr solr 0 2015-10-20 21:05 /solr
drwxrwxrwx - hdfs supergroup 0 2015-10-22 00:20 /tmp
drwxr-xr-x - hdfs supergroup 0 2015-10-20 21:04 /user
The python code fails
>>> import os
>>> from snakebite.client import Client
>>> c = Client('ip-10-0-0-166.us-west-1.compute.internal', use_sasl=True)
>>> os.system("klist")
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: edh@HADOOP
Valid starting Expires Service principal
10/22/2015 22:23:31 10/23/2015 22:23:31 krbtgt/HADOOP@HADOOP
renew until 10/29/2015 22:23:31
>>> list(c.ls(['/']))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/snakebite/client.py", line 152, in ls
recurse=recurse):
File "/usr/local/lib/python2.7/dist-packages/snakebite/client.py", line 1198, in _find_items
fileinfo = self._get_file_info(path)
File "/usr/local/lib/python2.7/dist-packages/snakebite/client.py", line 1326, in _get_file_info
return self.service.getFileInfo(request)
File "/usr/local/lib/python2.7/dist-packages/snakebite/service.py", line 35, in <lambda>
rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
File "/usr/local/lib/python2.7/dist-packages/snakebite/service.py", line 41, in call
return method(self.service, controller, request)
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
self._StubMethod(inst, method, rpc_controller, request, callback))
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
method_descriptor.output_type._concrete_class, callback)
File "/usr/local/lib/python2.7/dist-packages/snakebite/channel.py", line 435, in CallMethod
self.get_connection(self.host, self.port)
File "/usr/local/lib/python2.7/dist-packages/snakebite/channel.py", line 241, in get_connection
sasl_connected = sasl.connect()
File "/usr/local/lib/python2.7/dist-packages/snakebite/rpc_sasl.py", line 97, in connect
res = self._recv_sasl_message()
File "/usr/local/lib/python2.7/dist-packages/snakebite/rpc_sasl.py", line 78, in _recv_sasl_message
sasl_response = self._trans.parse_response(bytestream, RpcSaslProto)
File "/usr/local/lib/python2.7/dist-packages/snakebite/channel.py", line 411, in parse_response
self.handle_error(header)
File "/usr/local/lib/python2.7/dist-packages/snakebite/channel.py", line 414, in handle_error
raise RequestError("\n".join([header.exceptionClassName, header.errorMsg]))
snakebite.errors.RequestError: javax.security.sasl.SaslException
Client mechanism is malformed
Same result when running the code via python test.py (without the repl)
from snakebite.client import Client
c = Client('ip-10-0-0-166.us-west-1.compute.internal', use_sasl=True)
print list(c.ls(['/']))
@eschlon use AutoConfigClient. http://snakebite.readthedocs.org/en/latest/client.html#snakebite.client.AutoConfigClient
@praveev Thanks, I will try that tomorrow morning and report back.
You're a wizard. AutoConfigClient works like a charm.
Ill check whether this is really an issue with snakebite or a user config issue.
Op 23 okt. 2015, om 19:37 heeft eschlon [email protected] het volgende geschreven:
You're a wizard. AutoConfigClient works like a charm.
— Reply to this email directly or view it on GitHub https://github.com/spotify/snakebite/issues/175#issuecomment-150641851.
Any update on this issue?
Any update on this? I ran into the same issue. This doesn't work:
nn = [Namenode('HOST', 8020, 9)]
client = HAClient(nn, use_trash=True, effective_user=None, use_sasl=True)
while this works:
client = AutoConfigClient()
I checked the config AutoConfigClient
generated and it's exactly the same as the values I passed to HAClient
.
@praveev @eschlon @bolkedebruin @wouterdebie Did you guys have any update on this? Please see my post above.
AutoConfigClient
worked for me and I printed out the config in AutoConfigClient
. Then I used those values and created client with HAClient
and Client
but they both returned the javax.security.sasl.SaslException. Client mechanism is malformed
error message.
It seemed really weird since AutoConfigClient
is essentially calling HAClient
with some params. If I call HAClient
manually with those params, it just didn't work.
Really appreciate any help or pointers!
BTW I'm trying to get Airflow to work, but the code above is not related to Airflow so the issue should be Airflow-independent.
I think I found the cause of the problem (in my case).
__init__
in AutoConfigClient
calls:
configs = HDFSConfig.get_external_config()
which has side effects. In HDFSConfig.get_external_config()
which calls read_hdfs_config()
, there are two lines:
https://github.com/spotify/snakebite/blob/master/snakebite/config.py#L90
https://github.com/spotify/snakebite/blob/master/snakebite/config.py#L94
This is the only way cls.hdfs_namenode_principal
gets set. If you use HAClient
or Client
to create the connection, cls.hdfs_namenode_principal
will never get set. Thus the failure.
Client
and HAClient
should probably take hdfs_namenode_principal
as argument or their __init__
should read the value in some other way.
Thoughts? @bolkedebruin @wouterdebie
@ravwojdyla any thoughts on this?
@garthcn I completely agree with your assessment and thanks for tracking this down. I wouldn't call it side effects that the hdfs principal is set by the auto configuration. Maybe a proper fix is the append to the configs, like in line 85. But that is just from looking at this piece of code, I haven't thought about the implications.
@bolkedebruin
We can probably make line 12~15 here into instance variables. Then read_hdfs_config()
and read_core_config()
can return something like:
{
namenodes: [],
use_trash: True,
use_sasl: True,
hdfs_namenode_principal: <principal>
}
Then, change Client
, HAClient
to take in hdfs_namenode_principal
; and AutoConfigClient
will read the auto-configured hdfs_namenode_principal
from the dict returned above.
We also need to change all places where HDFSConfig.use_trash
, HDFSConfig.use_sasl
and HDFSConfig.hdfs_namenode_principal
are used (e.g. here)
@garthcn to me that seems to most elegant solution. For airflow I will create a patch that allows you to specify the principal or to use the autoconfig functionality
@bolkedebruin sounds good. @ravwojdyla @wouterdebie any thoughts on the proposal above? I'm working on a pull request.