"[FATAL tini (16)] exec driver-py failed: No such file or directory" with spark-py:v3.0.0 container image
(Re-)opening this issue as it was previously created but closed here I'm more than happy to close this one in favor of reopening the other one if someone suggests so.
Kubernetes version: 1.15.11 Helm version: 3.0.2
Note: These same steps were tested with spark-py-pi.yaml the 2.4.5 version (spark-py:v2.4.5) and there were no issues.
Steps to recreate
-
Create namespace
spark-operator -
Install the operator from the incubator/sparkoperator Helm chart. I pulled the chart locally and then ran the below command:
helm upgrade --install spark-operator sparkoperator/ --namespace spark-operator --set sparkJobNamespace=spark-operator --set enableWebhook=true
- Run
pyspark-piexample:
spark-py-pi.yaml definition (note the update to namespace field and serviceAccount):
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: pyspark-pi
namespace: spark-operator
spec:
type: Python
pythonVersion: "2"
mode: cluster
image: "gcr.io/spark-operator/spark-py:v3.0.0"
imagePullPolicy: Always
mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
sparkVersion: "3.0.0"
restartPolicy:
type: OnFailure
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.0.0
serviceAccount: spark-operator-spark
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.0.0
Launch Pyspark Application
kubectl apply -f spark-py-pi.yaml -n spark-operator
- Check logs:
[FATAL tini (16)] exec driver-py failed: No such file or directory
Which version of the operator are you running? Particularly, which image are you using?
Installed from https://github.com/helm/charts/tree/master/incubator/sparkoperator helm chart.
Currently deployed spark operator version: v1beta2-1.2.0-3.0.0
What is weird is now I notice the spark-operator image looks to be using spark 2.4.5
Image: gcr.io/spark-operator/spark-operator:v1beta2-1.1.2-2.4.5
Even with the operator running on 2.4.5 image, the scala spark 3.0.0 examples were working fine.
So the incubator/sparkoperator chart shows to use operatorVersion: v1beta2-1.2.0-3.0.0 but when you pull the chart (helm pull operator/sparkoperator) it's actually using v1beta2-1.1.2-2.4.5. Is this a bug with the chart or the helm chart hub?
I am also facing this issue. The spark-pi example worked fine. But the spark-py-pi gives this error:
[FATAL tini (15)] exec driver-py failed: No such file or directory
The docker image is: gcr.io/spark-operator/spark-py:v3.0.0
Similar issue as well, the change in entrypoint.sh that appears to lead to the issue above is here: https://github.com/apache/spark/pull/23655.
This commit and the commit after explain that spark-submit should be able to handle all the pyspark dependencies, but it is unclear how pyspark-specific arguments can then be passed in K8s command.
If I replace the entrypoint.sh with the version prior to this commit, it appears to work again, but this is a temporary patch at best and I obviously would not recommend as a long-term (or even short-term) solution except for debugging.
We are facing the same issue as well. Is there a fix for this?
Any fix, I also get this here error, when I launch python applications.
Has anyone tried manually building and using a 3.0.1 spark-py base image?
This looks like a compatibility issue between Spark 3.0.x and Spark 2.4.x due to changes to the Dockerfile entrypoint. This issue seems to show up if the operator image is based on Spark 2.4.x whereas the app image is based on Spark 3.0.x.
BTW: We have recently migrated the operator chart into this repo because of deprecation of the helm chart repo.
Has anyone tried manually building and using a 3.0.1
spark-pybase image?
I have tried manually building a base spark image for 3.0.1 and it worked fine with the operator.
@kaaquist - so my issue came from retrieving the helm chart from helm pull operator/sparkoperator. Something wasn't right there. You don't have to rebuild the image or anything - it worked better to just pull the git repository and do a helm install with what is retrieved there as it references the correct images. Somehow the git repo and the helm repo were using 2 different charts I believe.
@akuzni2, thanks. I did also use the helm chart from helm pull operator/sparkoperator. So good to know that if I just use the git repo, then it will work. I rewrote the app using Scala, and then I did not have the problem. But I guess that other people on the team might need to be able to use this trick. Thanks!
Apologies but I need some more clarification because I'm a helm newbie.
it worked better to just pull the git repository and do a helm install with what is retrieved
@akuzni2 I assume you are referring to https://github.com/GoogleCloudPlatform/spark-on-k8s-operator when you say 'repository'. If so, can you please point me to install docs that discuss installing using this method? All the install docs I see assume you are installing from the helm repo.
Thank you.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.