enterprise_gateway icon indicating copy to clipboard operation
enterprise_gateway copied to clipboard

I can’t start spark-cluster - Kerberos.

Open Armadik opened this issue 3 years ago • 2 comments

good day!

I can’t start spark-cluster.

Description

I set up a laptop with a getway link. --gateway-url=http://enterprise-gateway:8888 It is not clear in the instructions how to interact with Kerberos. I created hadoop.proxyuser. When I launch the enterprise-gateway pod, I get tgt. I do not understand if I need to get tgt for KERNEL_USERNAME ? I see a strange request for a proxy user. Does he need to be given access to store all Yarn logs?

Screenshots / Logs

`Starting IPython kernel for Spark in Yarn Cluster mode on behalf of user my-user

  • eval exec /opt/spark/bin/spark-submit ' -v ' '--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PATH=/opt/Anaconda-2020.11-1.0/bin/python ${KERNEL_EXTRA_SPARK_OPTS}' '' /usr/local/share/jupyter/kernels/spark_python_yarn_cluster-/scripts/launch_ipykernel.py '' --RemoteProcessProxy.kernel-id 74d3c3ef-6bfb-478c-9398-ff9e471f0ea1 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.response-address 1.3.32.200:44837 --RemoteProcessProxy.spark-context-initialization-mode eager ++ exec /opt/spark/bin/spark-submit -v --master yarn --deploy-mode cluster --name 74d3c3ef-6bfb-478c-9398-ff9e471f0ea1 --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PATH=/opt/Anaconda-2020.11-1.0/bin/python /usr/local/share/jupyter/kernels/spark_python_yarn_cluster/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id 74d3c3ef-6bfb-478c-9398-ff9e471f0ea1 --RemoteProcessProxy.port-range 0..0 --RemoteProcessProxy.response-address 1.3.32.200:44837 --RemoteProcessProxy.spark-context-initialization-mode eager [D 2021-11-10 16:06:42.032 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:43.035 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:44.336 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:45.234 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:46.034 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:46.935 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:06:47.833 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying.. ... [D 2021-11-10 16:09:41.232 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:09:41.433 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:09:41.433 EnterpriseGatewayApp] BaseProcessProxy.terminate(): None [D 2021-11-10 16:09:41.632 EnterpriseGatewayApp] ApplicationID not yet assigned for KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' - retrying... [D 2021-11-10 16:09:41.632 EnterpriseGatewayApp] YarnClusterProcessProxy.kill, application ID: None, kernel ID: 74d3c3ef-6bfb-478c-9398-ff9e471f0ea1, state: None, result: None [D 2021-11-10 16:09:41.633 EnterpriseGatewayApp] response socket still open, close it [E 2021-11-10 16:09:41.633 EnterpriseGatewayApp] KernelID: '74d3c3ef-6bfb-478c-9398-ff9e471f0ea1' launch timeout due to: Application ID is None. Failed to submit a new application to YARN within 180.0 seconds. Check Enterprise Gateway log for more information. [E 211110 16:09:41 web:2239] 500 POST /api/kernels (1.5.41.210) 181250.55ms`

conda list | grep jupyter jupyter-client 6.2.0 pypi_0 pypi jupyter-enterprise-gateway 2.5.1 pypi_0 pypi jupyter_core 4.8.1 py38h06a4308_0 defaults jupyter_server 1.4.1 py38h06a4308_0 defaults jupyter_telemetry 0.1.0 py_0 defaults jupyterhub 1.4.2 py38h06a4308_0 defaults jupyterlab 3.2.1 pyhd3eb1b0_1 defaults jupyterlab_pygments 0.1.2 py_0 defaults jupyterlab_server 2.8.2 pyhd3eb1b0_0 defaults

yarn logs -applicationId application_1635924651123_123 Unable to get ApplicationState. Attempting to fetch logs directly from the filesystem. Guessed logs' owner is proxy-user and current user <proxy-user> does not have permission to access /tmp/logs/*/logs/application_1635924651123_123. Error message found: Permission denied: user=<proxy-user>, access=EXECUTE, inode="/tmp/logs/test":test:hadoop:drwxrwx---

Environment

  • EG_IMPERSONATION_ENABLED = True
  • KERNEL_USERNAME= my-user
  • HIVE_CONF_DIR = /etc/spark/conf.cloudera.spark_on_yarn/yarn-conf
  • HADOOP_CONF_DIR = /etc/spark/conf.cloudera.spark_on_yarn/yarn-conf
  • SPARK_CONF_DIR = /etc/spark/conf.cloudera.spark_on_yarn
  • Others [e.g. Jupyter Server 5.7, JupyterHub 1.0, etc]

Armadik avatar Nov 10 '21 13:11 Armadik

I believe there might be two issues here, but I don't have an environment to test:

  • You need to properly initialize a Kerberos session for the user (kinit user-info)
  • The user you are impersonating, needs to be the user in KERNEL_USERNAME and also needs to have permission in some of these places by being added to the users' group or some similar one.

Another simple way to validate, is to make sure you can properly run a job in this environment impersonating the same user via spark-submit.

NOT A CONTRIBUTION

lresende avatar Nov 10 '21 23:11 lresende

@lresende Thanks for the quick response! I can successfully execute the spark-submit command under the user who entered the hadoop configuration proxyuser. The problem arises when I try to do the same with kernels.

It is not entirely clear to me at what point the request is formed /tmp/logs/*/logs/application_1635924651123_123

Instead of the special character "*", the account under which the spark process is running should be substituted?

Armadik avatar Nov 11 '21 08:11 Armadik