dask-yarn
dask-yarn copied to clipboard
MapR non-standard Hadoop security not supported
Hi guys,
I am sorry if it's a dummy/repeated question. We have been trying to follow the example to bring up dask on yarn and keep getting error "Kerberos ticket not found, please kinit and restart" even though the user starting the cluster does have a valid ticket.
Is there anywhere where I could specifically point to the ticket location at the runtime of the cluster? We have a hadoop cluster and wanted to use dask on yarn. Wondering if anybody has tried to work with this constellation. MapR hadoop cluster/ Dask on Yarn and could give us any pointers would be highly appreciated.
Thank you! Andre
Error attached. log-daskyarn.txt
This has to do with MapR forking Hadoop and not providing a 100% compatible authentication mechanism. The problematic code is here:
https://github.com/jcrist/skein/blob/6ac489e139f5169caae3a7a8415c92c418250e92/java/src/main/java/com/anaconda/skein/Driver.java#L248-L261
To provide a better error message than Hadoop API's do (just a deadlock :/), we try to detect user's forgetting to login before instantiating UserGroupInformation. Whatever MapR has done breaks this detection logic.
If you have any suggestions, I'd be welcome to a PR adding MapR support. I don't have a MapR cluster available for testing.
Hi there,
thanks for the reply. We are in contact with MapR support we will try to get some sort of way forward on that. We do have a Mapr Cluster and whatever help from them on that we will share in here.
@jcrist I am running into this exact same issue with a MapR cluster. Is there a temporary workaround that we could do? I would be interested in contributing this feature to dask-yarn. What do you think a fix would require?
@andregouveiasantana did you hear back from MapR about this issue?
The issue here is our check for whether a user is appropriately logged in before any requests are made. The Hadoop APIs block if the user isn't logged in (unfortunate design), so I'd like to keep this check around to provide a nicer user experience. Due to MapR's fork of Hadoop, our check code is incorrect. Without access to a MapR cluster to test on, I'm not sure what to do here. The MapR sandbox vm doesn't have security enabled, if you know of a way to get that test setup working and reproducible then I can take a look.
Unfortunately I don't have any way to get a test setup working and reproducible. But I am happy to test for you and post the work around/solution and help in any way that I can.
The issue here is our check for whether a user is appropriately logged in before any requests are made.
So as a hack for now would patching out this check (assuming that I am properly logged in) work?
The Hadoop APIs block if the user isn't logged in
By blocks do you mean that it just hangs and doesn't give a response one way or the other?
But I am happy to test for you and post the work around/solution and help in any way that I can.
I'd need to experiment with a running MapR install to figure out how their fork is different, which would be hard to do remote.
So as a hack for now would patching out this check (assuming that I am properly logged in) work?
Yes.
By blocks do you mean that it just hangs and doesn't give a response one way or the other?
Yes, an error is logged but the request just hangs forever.
Hey guys,
Sorry for the delay in the reply. We had been waiting for a reply from MapR, which took some more time that expected. Unfortunately it pretty much stated the obvious that the issue is related to the specific method of authentication used by MapR's implementation. I am attaching their reply for the moment. I didn't see anything that could help. I am trying to get them to give some more details about their implementation and maybe help point it out what exactly needs to be done. @jcrist, tks for letting the issue open and we will try whatever possible to get some further information. Maybe you can also tell me what exactly is needed from Mapr. I invited the developer to this thread, maybe he/she will be willing. I will keep you guys posted... Andre email.txt
The question I want to know is how to check beforehand from a UserGroupInformation object if the user is authenticated in a MapR context. Have they added another method to check if MAPRSASL authentication succeeded? Is there a way to detect a user is running on a MapR cluster instead of standard hadoop? Since MapR is closed source I can't determine this myself, if you're still in contact with them this would be good to know.
@jcrist, i have asked MapR for this infomation and if they can join the conversation here. Hopefully they can help.. @costrouc, have you managed to get working with the workaround of patching out the authentication check?
I can confirm similar issues with HortonWorks Hadoop.
Some non-substantive research (google queries) - led me to: https://community.cloudera.com/t5/Support-Questions/Connecting-to-Kerberos-Enabled-hive-via-JDBC-directly-from/m-p/95833
Having looked through the https://github.com/jcrist/skein codebase (used by dask-yarn for yarn connectivity) - I wonder if the approach to use the API "getLoginUser()" is best ?
Suggest change to use "getUGIFromTicketCache(ticketCache,userId)" - and add ticketCache and userId parameters as Driver arguments.
Note: that in my use case - we are explicitly using kinit prior to dask-yarn/skein instantiation and 'klist' reports a valid non-expired kerberos ticket.
dask-yarn (and skein, the underlying YARN client library) have been used successfully on hortonworks installations in the past (I've done it myself, and I know others that have as well). AFAIK hortonworks hasn't done anything special with their distribution, and skein works just fine with standard hadoop (while MapR has a fork with additional features we don't support, which is what this issue is about).
If you're having issues on hortonworks, please file a new issue in https://github.com/jcrist/skein where we can discuss them.
Submitted a PR for skein : https://github.com/jcrist/skein/pull/235
@jcrist Can you please review the fix for this issue?
Any update with the patching of the issue? I am using MapR hadoop too and facing exactly the same issue in 2023.