enterprise_gateway
enterprise_gateway copied to clipboard
Error Starting IPython kernel for Spark in Kubernetes mode
Description
When launching Spark-Python Kubernetes mode notebook, enterprise-gateway is getting Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: ?/.ivy2/local error, full error is below.
Screenshots / Logs
[D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (Kubernetes Mode)' with process proxy: enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Response socket launched on 'xx.yy.xx.yy:port' using 5.0s timeout [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Starting kernel: ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy'] [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Launching kernel: Spark - Python (Kubernetes Mode) with command: ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy'] [W 2019-08-26 18:12:41.858 EnterpriseGatewayApp] Shared namespace has been configured. All kernels will reside in EG namespace: enterprise-gateway [D 2019-08-26 18:12:41.858 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'LC_ALL': 'en_US.UTF-8', 'KUBERNETES_PORT_53_UDP': 'udp://xx.yy.xx.yy:53', 'LANG': 'en_US.UTF-8', 'EG_SHARED_NAMESPACE': 'True', 'HOSTNAME': 'enterprise-gateway-64f9dc585d-c7xkm', 'EG_ENABLE_TUNNELING': 'False', 'KUBERNETES_PORT_53_UDP_PORT': '53', 'KG_PORT_RETRIES': '0', 'NB_UID': '1000', 'EG_LOG_LEVEL': 'DEBUG', 'KUBERNETES_PORT_53_TCP': 'tcp://xx.yy.xx.yy:53', 'KUBERNETES_PORT_53_TCP_PORT': '53', 'JAVA_HOME': '/usr/lib/jvm/java-8-openjdk-amd64', 'CONDA_DIR': '/opt/conda', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PORT': '8888', 'CONDA_VERSION': '4.7.10', 'SPARK_VER': '2.4.1', 'KUBERNETES_SERVICE_PORT_DNS': '53', 'KUBERNETES_PORT_53_TCP_ADDR': 'xx.yy.xx.yy', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'KUBERNETES_PORT_443_TCP_ADDR': 'xx.yy.xx.yy', 'EG_CULL_IDLE_TIMEOUT': '36000', 'ENTERPRISE_GATEWAY_SERVICE_HOST': 'xx.yy.xx.yy', 'KUBERNETES_PORT': 'tcp://xx.yy.xx.yy:443', 'KUBERNETES_PORT_53_UDP_ADDR': 'xx.yy.xx.yy', 'PWD': '/usr/local/bin', 'HOME': '/home/jovyan', 'KUBERNETES_SERVICE_PORT_DNS_TCP': '53', 'KERNEL_UID': '1000350000', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PROTO': 'tcp', 'EG_MIRROR_WORKING_DIRS': 'True', 'KUBERNETES_PORT_53_UDP_PROTO': 'udp', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'DEBIAN_FRONTEND': 'noninteractive', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'EG_KERNEL_LAUNCH_TIMEOUT': '60', 'EG_SSH_PORT': '2122', 'ENTERPRISE_GATEWAY_SERVICE_PORT_HTTP': '8888', 'EG_CULL_INTERVAL': '60', 'SPARK_HOME': '/opt/spark', 'NB_USER': 'jovyan', 'EG_KERNEL_WHITELIST': "['r_kubernetes','python_kubernetes','python_tf_kubernetes','python_tf_gpu_kubernetes','scala_kubernetes','spark_r_kubernetes','spark_python_kubernetes','spark_scala_kubernetes']", 'KUBERNETES_PORT_443_TCP': 'tcp://xx.yy.xx.yy:443', 'EG_CULL_CONNECTED': 'False', 'EG_PORT_RETRIES': '0', 'KERNEL_GID': '1000350000', 'KG_PORT': '8888', 'ENTERPRISE_GATEWAY_SERVICE_PORT': '8888', 'SHELL': '/bin/bash', 'ENTERPRISE_GATEWAY_PORT': 'tcp://xx.yy.xx.yy:8888', 'ENTERPRISE_GATEWAY_PORT_8888_TCP': 'tcp://xx.yy.xx.yy:8888', 'EG_PORT': '8888', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_ADDR': 'xx.yy.xx.yy', 'SHLVL': '0', 'LANGUAGE': 'en_US.UTF-8', 'EG_KERNEL_CLUSTER_ROLE': 'kernel-controller', 'KUBERNETES_SERVICE_PORT': '443', 'EG_NAMESPACE': 'enterprise-gateway', 'NB_GID': '100', 'KG_IP': '0.0.0.0', 'PATH': '/opt/conda/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'EG_IP': '0.0.0.0', 'KUBERNETES_SERVICE_HOST': 'xx.yy.xx.yy', 'MINICONDA_VERSION': '4.6.14', 'KUBERNETES_PORT_53_TCP_PROTO': 'tcp', 'KERNEL_USERNAME': 'jovyan', 'KERNEL_LAUNCH_TIMEOUT': '40', 'KERNEL_WORKING_DIR': '/home/jovyan/work', 'SPARK_OPTS': '--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} --deploy-mode cluster --name ${KERNEL_USERNAME}-${KERNEL_ID} --conf spark.kubernetes.namespace=${KERNEL_NAMESPACE} --conf spark.kubernetes.driver.label.app=enterprise-gateway --conf spark.kubernetes.driver.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.driver.label.component=kernel --conf spark.kubernetes.executor.label.app=enterprise-gateway --conf spark.kubernetes.executor.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.executor.label.component=kernel --conf spark.kubernetes.driver.container.image=${KERNEL_IMAGE} --conf spark.kubernetes.executor.container.image=${KERNEL_EXECUTOR_IMAGE} --conf spark.kubernetes.authenticate.driver.serviceAccountName=${KERNEL_SERVICE_ACCOUNT_NAME} --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.pyspark.pythonVersion=3 ${KERNEL_EXTRA_SPARK_OPTS}', 'LAUNCH_OPTS': '', 'KERNEL_GATEWAY': '1', 'KERNEL_POD_NAME': 'jovyan-956248df-391b-4bdd-89a6-ead1b0732661', 'KERNEL_SERVICE_ACCOUNT_NAME': 'default', 'KERNEL_NAMESPACE': 'enterprise-gateway', 'KERNEL_IMAGE': 'docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev', 'KERNEL_EXECUTOR_IMAGE': 'docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'KERNEL_ID': '956248df-391b-4bdd-89a6-ead1b0732661', 'KERNEL_LANGUAGE': 'python', 'EG_IMPERSONATION_ENABLED': 'False'} [I 2019-08-26 18:12:41.864 EnterpriseGatewayApp] KubernetesProcessProxy: kernel launched. Kernel image: docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev, KernelID: 956248df-391b-4bdd-89a6-ead1b0732661, cmd: '['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'
Starting IPython kernel for Spark in Kubernetes mode on behalf of user jovyan
- eval exec /opt/spark/bin/spark-submit '--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} --deploy-mode cluster --name ${KERNEL_USERNAME}-${KERNEL_ID} --conf spark.kubernetes.namespace=${KERNEL_NAMESPACE} --conf spark.kubernetes.driver.label.app=enterprise-gateway --conf spark.kubernetes.driver.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.driver.label.component=kernel --conf spark.kubernetes.executor.label.app=enterprise-gateway --conf spark.kubernetes.executor.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.executor.label.component=kernel --conf spark.kubernetes.driver.container.image=${KERNEL_IMAGE} --conf spark.kubernetes.executor.container.image=${KERNEL_EXECUTOR_IMAGE} --conf spark.kubernetes.authenticate.driver.serviceAccountName=${KERNEL_SERVICE_ACCOUNT_NAME} --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.pyspark.pythonVersion=3 ${KERNEL_EXTRA_SPARK_OPTS}' local:///usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py '' --RemoteProcessProxy.kernel-id 956248df-391b-4bdd-89a6-ead1b0732661 --RemoteProcessProxy.response-address xx.yy.xx.yy:port --RemoteProcessProxy.spark-context-initialization-mode lazy
++ exec /opt/spark/bin/spark-submit --master k8s://https://xx.yy.xx.yy:443 --deploy-mode cluster --name jovyan-956248df-391b-4bdd-89a6-ead1b0732661 --conf spark.kubernetes.namespace=enterprise-gateway --conf spark.kubernetes.driver.label.app=enterprise-gateway --conf spark.kubernetes.driver.label.kernel_id=956248df-391b-4bdd-89a6-ead1b0732661 --conf spark.kubernetes.driver.label.component=kernel --conf spark.kubernetes.executor.label.app=enterprise-gateway --conf spark.kubernetes.executor.label.kernel_id=956248df-391b-4bdd-89a6-ead1b0732661 --conf spark.kubernetes.executor.label.component=kernel --conf spark.kubernetes.driver.container.image=docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev --conf spark.kubernetes.executor.container.image=docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev --conf spark.kubernetes.authenticate.driver.serviceAccountName=default --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.pyspark.pythonVersion=3 local:///usr/local/bin/kernel-launchers/python/scripts/launch_ipykernel.py --RemoteProcessProxy.kernel-id 956248df-391b-4bdd-89a6-ead1b0732661 --RemoteProcessProxy.response-address xx.yy.xx.yy:port --RemoteProcessProxy.spark-context-initialization-mode lazy
[D 2019-08-26 18:12:42.385 EnterpriseGatewayApp] 1: Waiting to connect to k8s pod in namespace 'enterprise-gateway'. Name: '', Status: 'None', Pod IP: 'None', KernelID: '956248df-391b-4bdd-89a6-ead1b0732661'
Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: ?/.ivy2/local
at org.apache.ivy.util.Checks.checkAbsolute(Checks.java:48)
at org.apache.ivy.plugins.repository.file.FileRepository.setBaseDir(FileRepository.java:135)
at org.apache.ivy.plugins.repository.file.FileRepository.
(FileRepository.java:44) at org.apache.spark.deploy.SparkSubmitUtils$.createRepoResolvers(SparkSubmit.scala:1060) at org.apache.spark.deploy.SparkSubmitUtils$.buildIvySettings(SparkSubmit.scala:1146) at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:51) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [D 2019-08-26 18:12:42.897 EnterpriseGatewayApp] 2: Waiting to connect to k8s pod in namespace 'enterprise-gateway'. Name: '', Status: 'None', Pod IP: 'None', KernelID: '956248df-391b-4bdd-89a6-ead1b0732661' [E 2019-08-26 18:12:42.898 EnterpriseGatewayApp] Error occurred during launch of KernelID: 956248df-391b-4bdd-89a6-ead1b0732661. Check Enterprise Gateway log for more information. [E 190826 18:12:42 web:2246] 500 POST /api/kernels (xx.yy.xx.yy) 1046.96ms
Environment
- Enterprise Gateway Version [dev]
- Platform: [Kubernetes]
Thanks for bringing this to our attention. Unfortunately, I'm unable to reproduce this issue right now but have a couple questions/suggestions to try...
- Is this a "vanilla" kubernetes environment or something more like Open Shift or AWS, etc?
- Is there a reason you want to use the shared namespace (so that all kernel pods reside in the enterprise gateway namespace)? The reason I ask is because I'm seeing some spark-related issues when sharing the namespace - although these are relative to accessing the context information - past the point where any kind of .ivy2 directory would be accessed.
Can you try settingEG_SHARED_NAMESPACE=False(the default)? More as a data point, but also possible solution since we don't recommend sharing the EG namespace. - You might try following the advice in this SO post and add something like
--conf spark.jars.ivy=/tmpto yourSPARK_OPTSof your kernel.json file. - Have you altered the kernel-spark-py image for your usage? If so, can you try with an
elyra/kernel-spark-pyimage from dockerhub?
I've attached to the kernel pod image and there is no ".ivy*" directory anywhere in the container. So, for whatever reason, my environment is not encountering this issue.
Closing due to lack of response - please reopen with additional information if necessary.
So, an easy way to reproduce this is to set the user in the docker image as 185 (anonymous)... Then, my understanding is that there were two issues:
- missing $home env var which causes the runtime to create the ivy2 in the root of the file system
?/.ivy2/local - no access to create the
.ivy2folder
My thoughts on fixing this were to:
- Set $home and create home/.ivy2 with widely open access which did not work
What really worked as to add the following config on the spark_opts
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp"
I will be working on a patch to configure that in the sample kernelspecs
Thanks @lresende!
set the user in the docker image as 185 (anonymous)
How exactly is this done? Is this something in the launch script or the image itself? I will need to reproduce the issue.
Does this apply to all spark-based kernels regardless of platform (k8s, Hadoop/YARN, ssh)?
Thanks @lresende. I can reproduce this using the following Dockerfile:
FROM elyra/kernel-spark-py:3.1.0
USER 185
However, I'm not sure if the HOME is relative to the ?/.ivy2/local reference or if that's WORKDIR since that will be the current working directory. In the Spark images, we have HOME=/home/jovyan but WORKDIR=$SPARK_HOME/work-dir where SPARK_HOME is /opt/spark.
If I try to adjust those permissions in either area, I still get the issue, so perhaps updating spark opts is the way to go.
Thanks for looking into this!